
Introduction to D3 - kickout
https://observablehq.com/@mitvis/introduction-to-d3
======
warpech
Technically, D3 is a JavaScript library, but in reality it is much more than
that. In this article, it is called "visualization grammar". I have heard it
being called "jQuery of diagramming", "declarative DSL for data visualization"
and so on.

Since 2011, there were already numerous cycles of the best of breed JavaScript
SPA frameworks and libraries (I'm talking everything from jQuery, through
AngularJs, to React), but in data visualization, D3 seems to hold its position
very well.

What are the things that D3 did right to get the acclaim, and why does it keep
it for so long?

D3 is approaching 10 years since the initial release, did it stand the test of
time and will it keep its power for the next 10 years?

~~~
jcranmer
> What are the things that D3 did right to get the acclaim, and why does it
> keep it for so long?

Once upon a time, there was an infovis library for Java called Prefuse. Then
the developers abandoned that library for a Flash infovis library called
Flare. A JS library called Protovis was inspired by Flare, and then Protovis
itself was replaced with D3.

I've used 3 of those libraries (never wrote any Flash code). The only one I
would actually recommend people use is D3--it's literally the first (honestly,
only) infovis toolkit I've used that doesn't feel like pulling teeth to get
anything done.

What D3 does differently from its lineage is that it is data-centric. Each
datapoint is an object, corresponding to a DOM node, and the layouts in D3
don't actually draw anything; they just set properties on the object--you're
responsible for actually converting those into setting SVG properties for
display.

It sounds like this would be a nasty headache, but this way is actually much
better than the traditional way of handling things (where the visualization is
essentially treated as a display widget in a normal GUI toolkit). Any time you
want to progress beyond a basic "display a graph with static data," you start
to need a lot more fine-grained control over display elements. Want to link
multiple displays of the same dataset, so clicking on one in one screen
highlights them in all the others? That's _very_ hard in prefuse, but quite
easy in D3.

~~~
andybak
Conversely you often seeing D3 recommended for people that just want to draw a
bar chart. That's a big learning curve and a steep cognitive overhead if you
do just want to do something fairly conventional.

Luckily there's now plenty of d3 wrappers to cover common cases.

~~~
lucasverra
if you have experience, witch 3 would you recommend for business as usual
presentations ?

~~~
landtuna
Vega-lite is nice: [https://vega.github.io/vega-
lite/](https://vega.github.io/vega-lite/)

------
Vaslo
Used D3 for a grad data viz course project a year or two ago. Extremely
powerful but unless you are doing custom visualizations or are excellent in
JavaScript and Visualization it’s a bit overkill. Much easier to do monthly
reporting or even one off stuff in Tableau, Power BI, or the like. Tableau or
Power BI I could pass to other analysts and without experience they could
figure it out. If I sent them my d3 code they would cry.

Again if doing advanced visualizations for a large newspaper, commercial
presentation, it’s extremely customizable and makes some really beautiful
charts.

~~~
sailfast
Using a non-proprietary platform also has the distinct advantage of supporting
whatever APIs you want to use and not having to deal with extracts of data.
Having to wait for one of these platforms to support JSON from a web URL is
really quite silly when you could write the query quickly (in python or
whatever) and pass the values to your view.

Or, you know, use PyGal or another server-side charting library:
[http://www.pygal.org/en/stable/](http://www.pygal.org/en/stable/)

~~~
inferiorhuman
_Or, you know, use PyGal or another server-side charting
library:[http://www.pygal.org/en/stable/](http://www.pygal.org/en/stable/) _

Is there much of a market for server-side rendering? I've been putzing around
with a d3-inspired charting library in Rust mostly as a brain teaser though.

~~~
dreamcompiler
I very much prefer to generate svg charts on the server. Not only is this more
sensible when you have a huge number of data points, but it even works with js
turned off (unless you need interactivity).

~~~
sings
Not disagreeing – I do this often too – but the downside can be having a very
large dataset which in turn generates a very large response, which _might_ be
more efficiently sent as data and constructed client-side. Less importantly,
Google’s lighthouse tests for a certain number of DOM elements, which a
complex chart can easily exceed.

~~~
inferiorhuman
Yeah my server-side use cases are for times when I'd want to use a graph
outside of a browser (e.g. PDF reports, printing, email distribution). If
you're targeting a browser IMO it only makes sense to move away from the
browser and/or JS if you're trying to create a static, raster image.

------
wattenberger
There are tons of different d3 modules (40+). As someone who uses d3
extensively, but rarely uses its selection and data binding functionality, I
put together a birds-eye view of the different modules. There is tons of great
functionality that is usually skipped over in favor of the DOM-manipulation
methods, like managing colors, dates, data munging, etc.

[https://wattenberger.com/blog/d3](https://wattenberger.com/blog/d3)

~~~
lioeters
That is an excellent set of articles for learning d3!

I love how it gives an overview of all the d3 modules, then explains them in
groups by related functionality. I just started exploring, and will study it
over time.

Thank you for sharing your knowledge, the articles are really well-done and
high quality. I'm guessing the visualization of d3 modules is done in d3
itself. Beautiful in concept and presentation.

------
factsaresacred
I switched from D3 to echarts[0] and never looked back. Still powerful and
customizable but comes with a much easier API to reason about.

[0] [https://github.com/apache/incubator-
echarts](https://github.com/apache/incubator-echarts)

~~~
ggregoire
Same here. I still haven’t found anything I couldn’t do with echarts. D3 is
always highly upvoted on HN, though it’s overkill in 99% of use cases in my
opinion.

------
boringg
I really enjoyed d3 for the most impressive data viz products. However I
always found the overhead to get d3 going and piping data to the explicit
locations to be burdensome for most applications. My sense was that you would
use d3 as a final polishing step for any data viz project/product or if you
wanted to make a data product that had complicated requirements/display needs.
I would be curious to hear if anyone uses it for data exploration or if the
overhead has been lowered over the last couple years (its been 3 years since
I've used it).

------
suyash
Link to the actual course that mentions all the details :
[http://vis.csail.mit.edu/classes/6.894/](http://vis.csail.mit.edu/classes/6.894/)

------
jonathankoren
Maybe I'm the minority, but whenever I worked with D3 to do anything beyond
the simple barcharts this shows, I ended up having to manipulate the SVG DOM
manually, which quite frankly sucks.

~~~
bhandziuk
This is my experience too. Once you start wanting to do anything that is not
just out of the box you need to know the ins and outs of SVG at which point
you may as well just make your own SVGs in JS. Which is what I've ended up
doing.

------
gargarplex
While I'm sure the authors poured in a lot of effort, I found this tutorial
difficult to follow despite the neat "notebook"-style webpage.

This was remarkably more intuitive and clear and I breezed through

[https://alignedleft.com/tutorials/d3](https://alignedleft.com/tutorials/d3)

------
artur_makly
When we launched [https://VisualSitemaps.com](https://VisualSitemaps.com), we
decided to use D3 since it really showed the site-mapping DataViz[1] really
well and even allowed for real-time manipulation ( try drag n droppin the
nodes in the demo below )

However, it does have its performance degradation once you go beyond 3000
nodes of data. So we are now in the process of rebuilding our mapper in
Canvas+WebGL via Pixi.js.

[1][https://app.visualsitemaps.com/share/7b4fd8556b102ed739cc308...](https://app.visualsitemaps.com/share/7b4fd8556b102ed739cc308efdf78c9f)

------
mkchoi212
Cool library! I like it when libraries don't have too many configurations set
to a default.

e.g.
[https://github.com/danielgindi/Charts](https://github.com/danielgindi/Charts)
has so many default options set for a graph that 80% of code is doing
something like

    
    
      chart.option.isEnabled = false

------
kickout
Any other libraries (that are as low of level) as d3.js? Is D3 still being
used heavily in production by people's experience?

~~~
sailfast
Used in production? Yes.

Alternatives at a low level? Hard to say - it really does allow you to do a
lot at a very low level if you want BUT...

\- Vega Lite / Vega

\- HighCharts (paid)

\- ChartJS

\- Raphael (unsure if this is still used as much)

\- Leaflet / Turf for GIS visualizations

\- Server-side? PyGal?
[http://www.pygal.org/en/stable/](http://www.pygal.org/en/stable/)

If you don't want to go so low-level there are a huge number of D3
abstractions that allow you to pick your chart and work with the data.
Britecharts from EventBrite is one example of an actively maintained
abstraction.
[http://eventbrite.github.io/britecharts/](http://eventbrite.github.io/britecharts/)

~~~
kickout
Thanks for this. Never seen HighCharts. Looks nice (pending $$$)

~~~
vosper
We use Highcharts at work, and have for years. It's a really good library,
definitely worth the money. We had a recent adventure with building new charts
in Victory, but performance was not good and we came back to Highcharts and
removed all our Victory code.

We're still happy with that decision :)

------
stared
As much I try to like ObservableHQ, this "updates happen above the code,
sometimes a bit far, so you if you scroll too much, you don't see them" is one
thing that makes it less intuitive than Jupyter Notebook style.

It took me some time to realize that running cells actually change something
in the country list.

~~~
TeMPOraL
It's a feature, not a bug. ObservableHQ notebooks are _reactive_ \- there's a
DAG underneath. It's not a Jupyter-style execution log.

~~~
K0SM0S
> _there 's a DAG underneath_

Would you or someone care to elaborate? Do we mean the 'linking' between
'nodes' (cells) as an interpreter/compiler would do over a file/object
structure?

Thus I assume, opening a world of options e.g. for vectorizing performance,
type checking and all?

If so, we have in one such notebook a true slice of "visual" IDE the kind of
which Microsoft could only ever dream about! (so far) ;-)

[Side-related note: I'm amazed at the emergence of the notebook paradigm over
the last 10 years, accelerating for at least 2-4 now. See how they do it at
Netflix. There's a case to be made that the notebook paradigm could really
bridge wide open the "programming rift" between nerds and, well, everybody
else, at least in skill-driven professional contexts. There's a short way from
here to a slew of clever domain-driven script languages plugging straight into
BI.]

~~~
TeMPOraL
> _Would you or someone care to elaborate? Do we mean the 'linking' between
> 'nodes' (cells) as an interpreter/compiler would do over a file/object
> structure?_

A DAG, or Directed Acyclic Graph. The most familiar example would be
dependency graphs between packages. Or the dependency graph of your code (as
interpreter/compiler would look at it). Or, any dependency graph in general.

So the way reactive programming works - whether in React, ObservableHQ, or
Excel - is this: you have these computation units (cells, pure functions)
which have dependencies and they themselves are dependent upon. This forms
your calculation graph, which you calculate by starting at the node without
dependencies and evaluating one node after another in topological order[0].

The main optimization this permits is reducing the number of calculations:
since dependencies are accounted for and navigable, whenever a node X changes,
only nodes that depend on it need to be recomputed (and their dependants,
recursively).

"vectorizing performance, type checking and all" are not related to this
concept. Reactive programming deals just with the dependency graph and
(re)computing the right amount of nodes in the right order. Contrast that with
a typical REPL model (or Jupyter model), where you execute cells one after
another in the order you wrote them, and they mutate the global state of the
application.

RE your side note: yes, the notebook thing is a curious phenomenon, especially
in a worse-is-better way (why did it have to be first Python, and now
JavaScript?!). It's much older than that, though - you could trace its origin
through things like Mathcad (essentially a buggy Jupyter requiring lots of
clicking, but which produced a convincingly-looking math papers, and could do
proper symbolic calculations out of the box), back to the early Lisp era (you
don't have to type things into a Lisp REPL; if you type them in a file and
annotate with comments as you go, you get a half-baked plaintext Jupyter).

\--

[0] -
[https://en.wikipedia.org/wiki/Topological_sorting](https://en.wikipedia.org/wiki/Topological_sorting)
\- i.e. you turn a graph into a sequence sorted so that the dependencies come
before the things that depend on them.

~~~
K0SM0S
Ah, I see now, thank you very much for the detailed explanation.

Having used Excel for years as a barebones "logical framework" of sorts
(before I knew better, in my teens, to solve various optimization problems in
games notably like "best in slot" or "best resource distribution"), I've
internalized a deep intuition for reactive programming. I had never realized
this was an actual paradigm!

On optimization / O(n), models tend to (d)evolve into highly recursive 'traps'
with this approach, in my experience. I learned the value of e.g. indexes,
isolating concerns, generally larger but flatter surfaces indeed.

RE notebooks: I had no idea there was such a history of that. It's interesting
that the approach only became somewhat popular recently.

~~~
TeMPOraL
You're welcome!

Fun thing I recently discovered about Excel: there's a button in it, Formulas
tab > Formulas Auditing > Trace Dependents (and the other - Trace Precedents),
which makes Excel start drawing arrows between cells, letting you explore the
underlying calculation DAG.

Could you tell me more about those 'recursive' traps?

RE notebooks, personally I blame it to a combination of a) Python taking the
scientist community by storm (perhaps thanks to scipy), where prior popular
scientific toolkits were proprietary, b) popularization of lightweight markup
languages (like Markdown) and c) popularization of browser as runtime.

There is history of scientists using org-mode for computational notebooks and
publishing purposes, ticking both a) (powerful, open toolkit supporting not
only Python, but just about anything) and b) very good markup language (org
mode), but this ties potential collaborators to Emacs, so it had no chance to
popularize. I don't know the relative timeline of org mode code evaluation vs.
IPython/Jupyter, so I can't say whether this qualifies as prior art.

~~~
K0SM0S
> _Could you tell me more about those 'recursive' traps?_

Well, these feel like 'traps' insofar as you suddenly fall into a crawl where
things were fine just a step before. It's really a hands-on engineering kind
of situation. You can't feel it much with small datasets. So I'm sure you know
the kind.

I really tried to write something worth reading but I'm afraid, after about a
page at it, these are just the ramblings of a young mind before learning to
program, etc.

Here's the gist:

\- I discovered circular dependency, breaking the DAG.

\- Off-by-one-errors on base cases.

\- O(n^x) without realizing, which hurts _later_.

It's just that now I know much more expensive words and concepts to describe
or solve these 'traps'. ;-)

RE notebooks, I think Python is the language of choice of science and data for
various reasons that made it a no-brainer for IPython/Jupyter, whose primary
purpose was clearly datavis afaict. You can plug community kernels¹ for just
about any language, though I'm not sure how much it integrates with tooling (I
know Julia is popular for math in Jupyter).

Despite having a terminal opened 24/7, I never actually tried Emacs and org
mode and I feel I missed a whole space in that regard...

Notebooks in their current form are certainly popular, but I hear too many
good features in other paradigms that leave room for improvement (or yet
another strong solution).

[1]: [https://github.com/jupyter/jupyter/wiki/Jupyter-
kernels](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)

~~~
TeMPOraL
Thanks for elaborating.

Yes, reactive paradigm definitely includes extra challenges - the DAG that's
actually being executed is usually implicit for the person reading the code,
so as it grows large, it may cause surprises and generally be hard to follow.
If you've ever worked with C/C++, you've seen this in action as the
recompilation problem - you change one innocuous header file, and suddenly
half of your project needs to be rebuilt (the #include instructions in your
project files are what forms the dependency DAG).

I wouldn't worry too much about circular dependencies. Reactive systems
usually need to know the dependencies of each component to build a DAG for
execution (whether you explicitly declare them or they get read from your
code), so at this point cycles can be detected. You have to be clever to cause
an infinite loop here. There are ways around the apparent occasional need for
circular dependencies (this is the same problem as circular dependencies in
software architecture in general, and same solutions apply).

(Though to be honest, I wish for a computation system that would work with
cyclic graphs. Some "circular dependencies" are feedback loops, and I don't
see a reason why a scientific-computation-oriented system couldn't try to
compute a fixpoint, or let you view the looped execution over time.)

> _O(n^x) without realizing, which hurts later._

O(n^x)? Not sure. A bunch of reactive "cells" in a DAG is no different than
calling each of them one after other in the right order; if you get sudden >=
O(n^2) out of this, it just means some of your cells are doing dumb things.
Note that cells don't get re-executed just because you referred to them a
couple of times. If you have:

    
    
      Cell 1: x = strlen(someString); //O(n)
      Cell 2: for(i = 0 ; i < N ; ++i) { doSomethingConstantTimeWith(x); } // O(n)
    

You don't reexecute Cell 1 multiple times, so the overall complexity is O(n),
not O(n^2).

> _I missed a whole space in that regard..._

You missed a bit, alright, but I'm not sure if I should recommend you should
go and investigate, given that it can be a time suck (albeit a very rewarding
one) :). But if you're willing to risk it, be sure to read some propaganda
material on how Org Mode is the best thing since sliced bread (it is), and if
you've never seen Lisp before, be sure to check it out eventually.

~~~
K0SM0S
Having a mental picture of the DAG of any execution is the sort of spatial
intuition that we're generally good at, I agree, it's the "implicit" graph we
tend to build by association. I've very little experience with C/C++ (intro
level at best) but from the Go angle I can see how reactive programming is
required to avoid huge compilation times.

> I wish for a computation system that would work with cyclic graphs.

It is baffling to me that we haven't such a paradigm available. I don't know
much about academic CS but I'm fairly sure there's one among a gazillion
formal languages that describes circular spaces.

Intuitively, I'd think it would have interesting applications for the
programming (modeling, computation, reasoning) of oscillatory phenomenons
notably.

I totally agree with you in 'practice', though I've tested literally none of
it even on paper. The basis paradigm, in a best-effort thinking-aloud, is that
any statement execution is a loop in itself; which fundamentally gives objects
a 'thickness' in time, a time dimension; thus some _φ_ or _θ_ property
(angular- _whatever_ you want to measure, some periodicity expressed in a
common clock).

Based on this, circularity is not a problem but a feature, and this would
define some Fourier of a "program", a system of elementary executions — its
periodicity in time, how "big" the loop.

I don't know, it's really interesting to think about such a paradigm of
representation, of programming 'models' and 'problems', behaviors.

About O(n^x), I guess I was trying to be as general as possible. Indeed, that
was exactly "dumb things"! Retrospectively I'd argue it's possibly by going
everywhere including into the dumb that you really get a "feel" for a
particular problem/solution space. Like flawed DAGs ;-)

When you naively translate ideas into computations (like a recipe to game
something optimally), it may end up looking more like

    
    
        # a bunch of discrete values, 
        # may be n-dim with indexes, table lookups...
        Column 1: x = [1, 2, 3,..., xn] 
        Column 2: y = [10, 20, 30,..., yn]
        Column 3: z = [(x+y), 2*(x+y), 3*(x+y),..., zn]
    
        # programming horror
        Columns 4, 5, 6...: 
        for i in x: 
          for j in y: 
            for k in z: 
              {unefficientImplementationOf f(i,j,k)}
              # 200-char highly redundant Excel formula
    
        # games have "levels" (for all objects potentially)
        # levels change rules: recursion down to L1 to compute
        Sheet 2: # "level 2", new indexes x, y, z, w...
                   # calls L1 every single cell
    

In effect that last block (new sheets) creates new 'real' dimensions (with
weird metrics) over the first 2D arrangement (sheet 1). Just a very not smooth
surface, actually not even fully coherent in many cases (lots of exceptions).

And when you don't optimize because you'd rather copy numbers (monkey brain
who can't make educated guesses) than find the actual functions (which must be
stupidly simple because games can't perform complex computations, but
admittedly made to be hard to retro-engineer). Basically, Excel as a numerical
emulator for some game space, some system to be gamified (empirical
optimization baesd on axiomatic rules).

I sure have fond memories of trying to crack these problems. High success rate
(like physics, it's real world so you approximate all that needs to be). I was
one of those guys making turn-key "calculators" e.g. for items or progression
in games like WoW, tools to solve complexity. The most interesting were social
tools, e.g. for players to "fairly" distribute some resource (positive
'reward' or negative 'work') based on some KPI — how ethics, values translate
into numerical models is quite the challenging but satisfying problem I find.

About Lisp, I assume you mean the programming language? That's indeed probably
#1 in my list of "different things" to try. I've read people who literally
_grew up_ in Lisp, in the 1980s iirc, and how that changes one's perspective,
actually much beyond mere programming. I've probably read the wiki page and a
few articles along the years. But right now I've just committed to doing a
Lisp-trip (training + small personal project) this year — yours was the straw
that broke the procrastination back.

(To be honest, I've a weird history with programming, I started before 10 with
BASIC but I'm just taking it professionally now (career change), some 30 years
later. Go figure. Life.)

Thank you for elaborating and all the good advice / perspective.

------
gfxgirl
If you get rid of the `width = 940` line then the charts become responsive.
`width` is a preset observablehq variable.

There's more work to do but it's a start

