Hacker News new | past | comments | ask | show | jobs | submit login
Visualizing Algorithms (ocks.org)
1557 points by callum85 on June 26, 2014 | hide | past | favorite | 88 comments



Author here, ask me anything. And don’t miss the related work section at the end — there’s a ton of links there to inspiring work.


Have you ever considered visualizing artificial neural networks in training or in operation?

Speaking purely from a selfish standpoint, it would be awesome to SEE what back-propagation "looks like" with different neuronal activation functions, or what feature learning by restricted Boltzmann machines "looks like," or how dropout causes networks to generalize better -- to name just a few possibilities.

If anyone can visualize neural network algorithms in a way that is intuitive and beautiful, it's you!


I haven’t done much with neural networks, but Christopher Olah wrote a fantastic post on visualizing low-dimensional ones:

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/


I have found this useful for helping visualize what's happening inside a neural net: http://cs.stanford.edu/people/karpathy/convnetjs/


If you're interested in visualizing RBMs or AutoEncoders in real time, you should check out my project VisualRBM:

https://code.google.com/p/visual-rbm/wiki/Screenshots

It basically lets you do exactly what you're looking for. I'm going to eventually add support for visualizing each layer of a DNN during back-propagation training, but that's several releases away.


Seconded. I was recently trying to find a good resource to simply and concisely explain (or at least give a 'feel' for) artificial neural networks to someone and came up blank. Everything I found was either assumed too much domain-specific knowledge or was too long.


Is there a "feel" to neural networks? You define a bunch of hidden/latent variables connected in an array (or multiple arrays), then perform a heuristic search of the weight state space to minimize some error/energy function. Sometimes it works, often not.


I love how you're getting downvoted when the truth is NNs work well _only_ when you have a team of specialists to select their hyperparameters. There's a rich vein of research for AutoML to automatically learn the hyps.


I don't know if this is the type of thing you're looking for, BUT... I made this little web app for visualizing the prediction gradient for a binary classification problem with 2 dimensional inputs [0]. You basically add members of each class by clicking in the plane. It has some bells and whistles for viewing the decision gradient (the probability of any given point in space being of one class or the other) and how this changes after each training. You can can also do things like change the learning rate, edit weight values, and add noise to weights.

In this toy problem, you can get more of a "feel" for the complexity capability of a network (by trying different clusterings of classes, etc...). Unfortunately, I hard coded it to have 2 hidden units. In retrospect, it would have been better to make the number of hidden units tunable as well so that one could visualize how a network with more non-linearities can draw increasingly complex decision boundaries.

[0] http://www.math.fsu.edu/~mhancock/#!/software/web-apps/neura...


Most d3 books out there are entry level. They don't cover advanced concepts which are needed to build serious visualizations. The only way to learn advanced stuff is to go through your bl.ocks, which could be daunting sometimes without comments. It'd be awesome if you can think of writing an advanced book. I know its too much to ask for, especially considering your active contribution on Google Groups and StackOverflow.


I’ve made a couple starts on books but so far haven’t been able to stick with it long enough to finish; it becomes too tempting to release content as examples or smaller articles. And while those are useful, too, piecemeal content lacks the cohesiveness / comprehensiveness / depth of a book… so I’ll keep trying and hopefully find a way to make a book in the future.

(And by “book”, I’m including online publications like Mark Pilgrim’s excellent /Dive into HTML5/. I’d almost certainly publish online given the interactive nature of the subject.)


If it would help, I'm 100% positive that a large group of people would be happy to contribute to a kickstarter campaign to raise funds for you to be able to focus on a book and/or further development of d3 for a year. There's been a few successful kickstarters like this already (Hello Ruby book, git-annex project, and many others come to mind), so it can certainly be done (of course this only helps if the main obstacle is the "everyday" work getting in the way).

In any case, thank you for all your work. It's an inspiration.


+1 I would contribute. Seriously you should think of how much money you would need to incentive you to really focus, then quintuple it, then post a kickstarter.


+1 this. I'm on a Kickstarter hiatus right now, but I'd throw into this on Mike Bostock's name alone.


+1. I'd chip in for sure


+1 I'm interested in learning some advance visualization, graphics, animations as well. Any good book recommendation would be appreciated.


I'll throw money at you, for sure!


this is only slightly tongue in cheek, but do you ever get tired of being so awesome? Seriously. this shit is amazing.

thanks for all the time/effort you've put into d3. my side project (machete.io) certainly wouldn't be possible without it.


Ha, thank you. The trick to appearing awesome is to surround yourself with people doing awesome things, and then figure out how to synthesize their ideas into something new and (hopefully) transformative. Most of the ideas & forms in my article aren’t new, but I tried to weave them into a broader look and derive some interesting variations.


Oh man, that's awesome! I see I'm not the only one with an addiction to d3 and knockout. I've actually just recently gotten around to playing with the two of them at once: http://jsbijn.com/gitufejo/3/edit?html,js,output


Apparently copy+paste is too hard.... http://jsbin.com/gitufejo/3/edit?html,js,output


Appears you have an extra closing parenthesis on line 3.


Awesome side project. See you're using Crossfilter. Sports data seems like natural fit. As a huge football (American) fan, and former player in college, I know how valuable down/formation data can be when trying to come up with a game plan. It would be interesting to see who's signing up for your beta ;)


How many hours did it take for you to make this page? (even after you created your initial talk.)

I love these sorts of interactive, visual essays and would love to find ways to lower creation times.


Writing always takes longer than I expect. The issue wasn’t technical (it wasn’t too hard to take my examples from bl.ocks.org and embed them on a page). It just required more thought to take my speaker notes and translate them into prose. You can get away with more hand-waving in speech than in writing.

Also, there were a few things in my talk that I wanted to fix, like replacing the rainbow color scale with something more effective.

The examples themselves I worked on intermittently, typically for an hour or two in the evenings. I got interested in mazes as an analogy for design process when I gave a talk at OpenVis earlier in the year.


> Writing always takes longer than I expect.

If it makes you feel better, the time was well spent. I found the prose lucid yet concise, and I know it takes a lot of refinement to get to that.


>it wasn’t too hard to take my examples from bl.ocks.org and embed them on a page

That too was interesting. You prompt the reader to view the source code of the page which lead me down the rabbit hole of !function {}() in javascript.


Beautiful! These would be very useful to someone learning these algorithms, but also to someone interested in creating their own visualizations. I like how you highlight the drawbacks of different types of visualizations and demonstrate that different visualizations can have different goals, and offer creative solutions for both.

One nitpick with the maze visualizations though. I found that the maze color flood animations have the same issue that you mentioned sorting has: animations are frustrating to watch because you have to wait and then rely on memory to recognize patterns. Specifically, I found the color scale rotation was much too fast to see large patterns, and even small patterns were too dense to be able to trace backwards after the maze had been fully colored.

I have an idea for an alternate visualization: Only show the fully colored maze (no intermediates), but vary the color rotation length over time from frame to frame. You'd be able to see color rippling through the maze and be able to follow the ripples over both large scale and small scale features.


Thanks for the feedback. There’s a similar idea I also want to try, which is to use color cycling for the mazes. Like this:

http://www.effectgames.com/demos/canvascycle/

Varying the rotation length sounds interesting and, like you suggested, could be great for seeing both micro and macro features. There was a bug previously where the Prim’s visualization rotated twice as fast, and it looked quite different!



Color cycling would be great too, especially for micro features.


Sorting-algorithm animations have a long history, and, when I was younger, I thought these animations were fun to watch, but I never learned anything from them. At this point in my knowledge, I wouldn't expect to learn anything from a visualization unless I had already studied the algorithm quite closely and I had a specific question I wanted to answer. Even then, it's a shot in the dark; most algorithms don't lend themselves to being visualized. The animations are very pretty, though -- who knows, maybe they will help someone.


I think sorting animations could help someone with a purely technical understanding make the jump to an intuitive understanding, but I agree, it would be a stretch to say that it could teach them the algorithm by itself.


Hard to say. What helps me understand algorithms is lots of mental effort.


In case you're revisiting this post later ( 17 hours elapsed when I got to your article )

.

1. Recording the missteps to Perfection

.

In future could you record the number of edits required to come up with your finished essays ? I've always found that it takes a lot of steps to make something that elegantly looks as though no mis-steps were taken. Some form of screen recording style evolution of the essay over time

I know Paul Graham had a live essay session recording.

.

2. Archiving live and interactive works.

.

I guess your piece will archive okay as it is the sum of static files. But I do hope that your essay plays well with archive.org so that future people can enjoy your article.

.

3. The future of the interactive essay

.

Relating to 1. I can't imagine that preparing an interactive essay is an easy affair. A lot of effort goes into a static essay. How much work is it to interactivate it? Do you feel it will become more widespread as an essay form or restricted to a select bunch of interactivists ?

E books - smartphones really make interactive essays possible to disseminate.


1. Ah, but how to define an “edit”? The git repo for the talk had 69 commits, but they varied greatly in size. There’s also the history of the individual examples (which could be crawled from GitHub Gist), the history of the write-up (which I squashed on merge, sorry!), as well as various notes scattered while I was still figuring out the topic.

2. I hope it archives well — there are no dependencies on other sites, though it loads resources via JavaScript. The site is backed by a repo on GitHub. (Though I should point out that I retain copyright on my personal website, despite the source being viewable.)

https://github.com/mbostock/bost.ocks.org/tree/gh-pages/mike...

We have a similar concern with published graphics on The New York Times. It’s funny and sad now how so many animations on University course websites are practically unviewable because of waning support for Java; I expect it will be similarly awkward to run Flash plugins in ten years. On the other hand, content written to web standards seems to have a longer shelf-life, as the standards are widely supported by many organizations, not just one. So my hope is that standards-based graphics will both archive well and continue to run on evolving browsers.

3. Yes, it’s already the case that graphics (and further interactive graphics) are increasingly integrated with prose, rather than being relegated exclusively to standalone content. That’s not to say standalone graphics are bad — there are many viable forms for graphics, and sometimes you want it to be standalone — but that we’re figuring out ways to integrate “multimedia” more elegantly and less gimmick-ly.

I still think the hard part is expressing the ideas & communicating effectively rather than the technology. Designing interaction is hard because there are so many ways to do it, and you don’t always know what will be intuitive to readers.


Fantastic article! I am a big fan of your work. Some questions:

- any plans on trying to make a canvas based d3 adapter/library? Also thinking about webgl here, although I believe x3dom works well enough

- have you played with other programming languages to evaluate their support for data vis expressiveness? If so, what would you recommend to try out?


Re. Canvas, yes and no. On the one hand, D3 (data-driven documents) is intrinsically tied to the DOM and I don’t want to compete with standard representations like HTML, SVG and CSS. On the other hand, D3 supports Canvas for geographic projections and we could expand that to include other geometries like d3.svg.{line,area,symbol} and d3.geom.voronoi. But it’s only worth dropping into Canvas if there’s a significant performance bottleneck, and often there isn’t. Jason and I are working on the geometry pipeline so we’ll see. X3DOM also fits well with D3, though I tend to focus on 2D visualization.


First, big thank you for D3, it's an amazing piece of work.

That said, DOM/SVG performance is still a problem with mobile browsers. We tried to do D3 based visualization tool that would work well in mobile. Especially with panning and zooming, which are natural gestures in mobile and often useful because of the limited screen estate, you will hit rendering performance problems easily.

With a proper use of CSS transforms, culling of data points and intelligent redrawing, you can get the performance to adequate levels on iOS Safari, but you have to throw away the pure data-driven documents approach and start to think it more from the "rendering pipeline" angle.


> it’s only worth dropping into Canvas if there’s a significant performance bottleneck, and often there isn’t

Well actually, I find that when displaying more than a few hundred polygons (or even simpler elements, actually) then SVG often becomes problematic (especially on Firefox).


Have you tried batching updates? Reducing number of repaints? Most likely you can bump it to at least several thousands polygons if only you reduce the number of DOM updates.


It depends a lot on the complexity of the polygons of course, and my "a few hundred" was a bit low. What I am actually thinking of (what I'm often doing) is displaying data about France, and no matter what you do, one polygon for each of the 36000 communes, or even for each of the 3000 cantons, is always too much for Firefox. Using a canvas greatly limits interactivity though

Thinking about it, maybe a good solution (to have both a detailed map and tooltips or other contextual things) would be a canvas with precise borders, overlaid with simplified transparent SVG polygons (say, a Voronoi diagram from each polygon centroid, or something).


Not to be completely contrary - in fact, I much prefer SVG + D3 to canvas - but the canvas actually can support quite complex interaction and be performant if you put the time into it.

Take a look at OpenLayers 3. They have a few examples they use as benchmarks for canvas rendering: http://ol3js.org/en/master/examples/synthetic-lines.html http://ol3js.org/en/master/examples/synthetic-points.html

Granted, these absolutely crush Firefox (Chrome handles them fantastically; IE about average), but they're still great examples to how performant the canvas can be. As far as interactivity goes, all you need are a little bit of extra attention to your events and rendering, and it works just as well, if not better, than SVG in many cases. Look at http://ol3js.org/en/master/examples/draw-and-modify-features... for a good example.

And you're right regarding SVG overlays on a canvas - it actually works quite well: http://ol3js.org/en/master/examples/d3.html

Of course, OL3 is both highly specialized to mapping applications (you know, being mapping library and all), and highly optimized for canvas rendering, but it does serve to show flexible canvas can be.

(edit for formatting)


Here's a technique for adding hover/click events to a canvas map. Use a second hidden canvas which has a unique rgb color for each of the regions. Invert the x/y mouse position to lat/lon, lookup the color in the hidden canvas, then lookup the region using that color.

You can make the hidden canvas 2x larger than the visible canvas to get good precision on the borders of regions. It's admittedly hacky, but has good performance characteristics since it only uses two canvas elements.

http://bl.ocks.org/syntagmatic/6645345 (hover only activates on mousemove, so the selected region can slide off the cursor)


Slight tangent: in those cases where canvas is desired only for the ability to convert a visualization into a PNG (rather than performance), it is possible to serialize a D3-created SVG into an XML string and render it to an image/canvas. The process is straightforward for Chrome and Firefox, but needs the canvg library for Safari.


First of all, amazing article!

However, I believe that the bit about light being a "continuous" signal in the first paragraph invites conflicting thoughts of wave / particle duality, which distracts from and is not at all relevant to your point.

The eye samples light because it connects to a machine with a fixed number of inputs. This setup would also require sampling if the signal were not continuous at all, but instead consisted of a much larger number of discrete parts than the "sensing" equipment could handle.


Yes, you can consider light a discrete signal, too. I glossed over this because resampling a discrete signal is harder to explain than sampling a continuous signal; I think about it as first reconstructing a continuous signal (typically by convolution) and then sampling it, as described here: http://graphics.stanford.edu/courses/cs248-05/samp/samp1.htm...


Can someone clue me in on the usage of bitwise operators when declaring the cardinal direction variables (NSWE)? I can't make sense of it -- those values don't seem to make use of any bitwise operations further down the line, unless I'm missing something, which is very possible.

Is this just a stylistic choice to signal the uses for those variables? (Though I confess I still don't know what that signal would be, anyway.)

Any illumination would be appreciated, but should MBostock still be following this, thanks so much for your work! So inspiring.


Sure, it’s a bit field, which is an efficient way to store multiple booleans (bits) in a single number.

https://en.wikipedia.org/wiki/Bit_field

The maze is defined as a rectangular grid of cells, where each cell is bit field specifying whether you can navigate from that cell to each of its four neighbors: the cell above (N), the cell below (S), the cell to the right (E), and the cell to the left (W). The bit masks are powers of two (N = 1 << 0 = 1, S = 1 << 1 = 2, W = 1 << 2 = 4, E = 1 << 3 = 8) to uniquely assign each bit to each of the four directions.

For example, say that you have a cell that’s open to the north and the south, as part of a vertical passage. The bit field therefore is 0011. To check whether you can go south from the cell, you use the bitwise AND (&) operator: 0011 & S = 0011 & 0010 = 0010 = truthy. To likewise check whether you can go east from the cell: 0011 & E = 0011 & 1000 = 0000 = falsey.


Beautiful article. Is there a repository with the source code available? You mention viewing the source in the article but my mobile browser can't do that.



Thank you for your work. Very impressive work. Examples of these algorithms cat let my students can easily lean.


you said ask me anything :)

What about vizualizing randomness ?

Like in this article ? http://lcamtuf.coredump.cx/oldtcp/tcpseq/print.html


thanks for producing such beautiful work! I've used these visualizations to explain what a programming thought process is like.


Off-topic: Why isn't my macbook a jet engine right now? I've read plenty of blogs with nothing but a parallax scroll at the top and my computer fan goes insane.

But, on this blog, TONS of dynamic code running and not a peep.


> Why isn't my macbook a jet engine right now? I've read plenty of blogs with nothing but a parallax scroll at the top and my computer fan goes insane

Ah, the beauty of Javascript!


Fits right in with my rapid-temperature-rise-based workflow!


Obligatory: http://xkcd.com/1172/


Precisely what I was thinking by the end of the post.


Funny thing about that random comparator shuffle - Microsoft used it for their browser selection screen (part of the EU antitrust settlement). Oops!

http://www.robweir.com/blog/2010/02/microsoft-random-browser...


Breathtaking work!

Found a ton more interesting examples here: http://bl.ocks.org/mbostock



Power of visuals, so pretty.


This was really great.

Especially maze turning into spanning tree. That one was truly mind blowing.


I could watch that for hours straight.


This one is also interesting (requires WebGL) http://ottoallmendinger.github.io/js-quickhull3d/


+1 if you didn't understand a thing but kept scrolling for the sweet, sweet animations


Here're are a couple of papers you'll probably enjoy:

Design and implementation of the UW Illustrated compiler by Andrews, Henry, and Yamamoto PLDI '88

and

The University of Washington illustrating compiler by Henry, Whaley, and Forstall PLDI '90


I can't find any copies of these papers that aren't behind a paywall. Is there any record of what the visualizations looked like?


Great piece of work!

I've seen in the piece of code corresponding to the Fisher-Yates algorithm this snippet: "n-- | 0". Has the "| 0" any importance?


x | 0 will truncate a floating point x to an integer. The multiplication happens first so the overall expression acts like i = (Math.random() * n--) | 0.


This is actually the only way to do integer arithmetic in Javascript (without the array types introduced by WebGL). It's how asm.js signals integer math as well: https://en.wikipedia.org/wiki/Asm.js#Examples


Anyone interested will probably also enjoy Mike's talk “Design is a search problem” from Openvis conference: http://www.youtube.com/watch?v=fThhbt23SGM


This is incredible. As a non-CS major, it's been truly a fascinating read!


Really is interesting how proper visualization allows you to see very subtle probabilistic distinctions that probably only a phd in stochastic processes would understand intuitively.


I wonder about visualizing the algorithms that visualize the algorithms. Being serious and curious here.



Tread carefully, for you risk breaking the universe.


Beautiful...just to understand that algorithms also paint beautifully ...visualising algorithms awesome!


How did you people learn these things without visualizations?


The web is truly the new <canvas> for the artist.


I'm thinking the same thing.


Great job, this both reads and looks great!


well this excites me now. can anyone help me master the basics or suggest any good book?


please recommend a good book that would assist me with the basics.


Hug of death?


How many times has this been on HN..

https://news.ycombinator.com/item?id=7822983 https://news.ycombinator.com/item?id=7652333

I know it's more times but I haven't found those threads..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: