Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft open sources SandDance, a visual data exploration tool (microsoft.com)
136 points by GordonS on Oct 11, 2019 | hide | past | favorite | 30 comments



This is awesome! I’d missed this news from last week, so I’m glad to see this on the front page. I remember looking at this when it was first released (it was Power BI only at first, right?) and trying to find a way to utilize it in a data journalism project I was working on.

Love that it’s a VS Code extension (and apparently has been for 6 months, shows how in the loop I am) — this looks super slick.

Disclosure: I work at Microsoft but not on this team, as evidenced by the fact that I missed this news (ironically, part of my job is producing a weekly developer news video show — clearly I just missed this).


Very excited that this has been made open source. This was one of my favorite projects to follow back in 2015 or so when I was at MS.

It's a very neat idea. The way I think of the core idea is that you give instructions to all of your rows, and they arrange themselves in a way that high level patterns become visible. That way you can always see the forest and the trees. This is opposed to the current practice of generating new marks from functions on the data to then see high level patterns. In the current practice you can see the forest or the trees.

A very interesting thing happened to me though when I played with this pattern for a while (and this is anecdotal, so take with a grain of salt): it seemed to damage my eyes. I would click around changing the views and the dots would jump in all sorts of directions to the new layouts, and after a while I got headaches and visual aura. Now, it was probably just a coincidence because I was also a bit overworked at the time, but because it was so new, and in some ways so unnatural (to see so many patterns of movement so quickly that you don't generally see in nature), I decided not to take chances and moved away from animated visualizations. Anyone else ever had anything similar happen?


> A very interesting thing happened to me though when I played with this pattern for a while (and this is anecdotal, so take with a grain of salt): it seemed to damage my eyes. I would click around changing the views and the dots would jump in all sorts of directions to the new layouts, and after a while I got headaches and visual aura.

I suppose the animations are there just for show. Just like iPhone's parallax effect which I find odd whenever I notice it.

PS: I assume you ruled out photosensitive epilepsy or something serious.


VS Code extension [0] seems pretty good, tested to be fast and useful on 10k records. I see this as a nice way to explore a new dataset prior to a more extensive investigation.

[0] https://marketplace.visualstudio.com/items?itemName=msrvida....


This is a pretty neat package, and it's suprising how much it contains.

Reminds me of the Vega project [1] which never really got as much popularity as it should have.

1. https://vega.github.io/


The SandDance README says it uses Vega for chart layout.


Missed that. Great to see the uptake then.


strongly recommend folks interested in data viz investigate Vegas further and especially VEGAS, VEGAS-LITE and the related projects such as Voyager, LYRA and CompassQL. As noted below/above great to see it used in SandDance


Wow! I worked next to an intern team this past summer that was attempting to build SandDance into Azure Data Studio - super cool to see a big press release from Microsoft mentioning an intern project as one of the major places SandDance is available to use, especially only less than 2 months after it was handed off to the full-time employees.


This is pretty cool. It reminded me of Hillview (https://github.com/vmware/hillview), something I worked on a few years ago while interning at VMware research (a few team members in fact used to work at Microsoft Research).

Hillview is a lot less fancy at this point, but the ambition is that it scales a lot better (that is, to way more than 500K rows. In the order of trillions).

Paper here: http://www.vldb.org/pvldb/vol12/p1442-budiu.pdf


Very interesting. "a trillion-cell spreadsheet for big data". Google Sheets IIRC caps at 5M cells.

The HN mod DG built something pretty awesome back in 2009 called SkySheet, perhaps if he's reading he'll let us know what the limits of SkySheet were back then?


Not sure why they released a press release about this now. This has been out on Github for like 6 months, right?

If you haven't seen it, SandDance is a graphing component where the elements of the graph (like the bars in a bar chart) are made up of individual boxes representing all the rows in the actual data. So in a bar chart, you can highlight part of the bar and see the exact data point it came from.

I appreciate MS releasing it and it is a nice representation for some types of interactive visualizations. But since it is a "modern" tool written as a jumble of web views, javascript and React, it isn't exactly the most performant thing.


2 months ago I discovered Kepler[0] a tool by Uber for visualizing geo data. It has been of tremendous help when analyzing large geospatial datasets. It allows me to quickly visualize the results of my computations, and spot anomalies and patterns that I wouldn't have noted without it.

This seems to be a similar tool, but for charting non-geographic datasets. It should also be extremely useful.

I am glad companies are making these tools available for the masses.

[0] https://kepler.gl/


Both Kepler and Sanddance are built on top of another Uber creation called https://deck.gl


They probably thought they were on the trail of something great but came up with something simply neat to play with which is probably why it is now open source.



> The release is comprised of several components that work in native JavaScript or React apps

This is worded poorly; at first I thought it meant React Native. I can only tell that's not what it means because "native JavaScript", despite being a weird term, wouldn't refer to that.


A company I worked for built this almost 10 years ago. Here is our video from SIFMA where we gave a joint presentation with Microsoft. Interesting.

https://www.youtube.com/watch?v=7KBaX_t_hvM


A company I worked for built this almost 10 years ago. Here is our video from SIFMA where we gave a joint presentation with Microsoft. Interesting.

https://www.youtube.com/watch?v=7KBaX_t_hvM


> SandDance shows every single row of a dataset (for datasets up to ~500K rows)

That is most impressive. Their online demo[1] has about 25K rows.

[1] https://sanddance.js.org/app/


I guess I'm a little slow, but I was surprised that a plot of latitude and longitude would generate something that looked like a map.

Of course thinking about it, it makes complete sense but I was genuinely surprised at first.


What would be some unique uses for this that one perhaps couldn't accomplish with more traditional visualizations, any ideas?


Is this any good for non-spatial stuff? The graphs look very spatial-y

Either way...nice one MS


Yet another example of Megacorp attempting to swallow smaller fish (Tableau), which showed a great example of leading the market with something innovative and high quality. I haven't used PowerBI, but I have a feeling already that this is going to work great for Microsoft.


Tableau did very well and was acquired by Salesforce. Looker went to Google, Periscope went to Sisense, and there are still dozens of new upstarts.

I don't see how a free open-source visualization library is going to harm these companies, if anything it should help new startups with their own products.


But isn't Tableau owned by Salesforce now? That evens the playing field some.

Also, this is a free and now open source tool from Microsoft Research that is independent of PowerBI.


Also it's open source, which Tableau is not.


Tableau's licensing model/architecture is fairly draconian. You can't even test it out for free without giving them public access to all the data you use to test, and you can't embed it in a web app without having a dedicated Tableau server process backing it. It's possible these constraints make sense for enormous enterprise customers, but they make it impractical for small companies.


They have a 30 day eval that’s pretty decent for testing.


Are you finding a way to complain about an open source tool? Really?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: