Dataflow, a self-hosted Observable notebook editor

simonw · on May 13, 2021

This project looks fantastic.

I adore Observable notebooks, but the one thing that makes me hesitate in using them for everything is that the editor component itself is closed-source and only available on https://observablehq.com/

They're great open source ecosystem supporters - they released their runtime, their parser, their standard library and all sorts of other stuff through https://github.com/observablehq - but the editor itself is their proprietary sauce.

I totally support their decision on this - it's what they're building their business around, and I want them to be successful. But as a user it does give me pause.

This project from Alex Garcia looks like a fix for exactly that. Having more-than-one editor for their notebook format (and an open source option a that) resolves my hesitancy in leaning hard into their ecosystem.

I don't even see it as a competitor to ObservableHQ - the hosted Observable editor has collaboration features that don't even make sense for a local running version.

Plus, Dataflow has some great ideas of its own - in particular the live file attachments thing.

edtechdev · on May 13, 2021

Yeah the lack of open source prevented me from committing to observable, too, so I look forward to trying dataflow out.

Just in case this is of interest to others, some other open source browser-based computational notebook tools include:

* Starboard https://starboard.gg/ * And of course there's always Jupyter, but it requires a server component

And this isn't the same thing, more of a javascript playground (open source alternative to codepen and the like), but see also Slingcode: https://slingcode.net/

kragen · on May 13, 2021

Thank you for the awesome recommendations! Note that Dataflow isn't open source yet, though.

kragen · on May 14, 2021

Fixed, as per https://news.ycombinator.com/item?id=27149141!

kragen · on May 13, 2021

I agree that the proprietary editor has been a showstopper for what is otherwise a very appealing advance in programming environments; many of us are old enough to have learned the hard way not to base our careers on proprietary infrastructure.

But it's not clear that this new project is a fix for that: there's no license file or licensing notice on https://github.com/asg017/dataflow and no mention of licensing in https://observablehq.com/@asg017/introducing-dataflow. So, unfortunately, under Berne Convention copyright laws (which is most of them nowadays) the software is by default restricted by copyright, and looking at it may put you at legal risk, because access plus substantial similarity is deemed to prove copying.

Now, possibly that isn't their intention; https://github.com/asg017/dataflow/blob/main/package.json does say "license": "ISC", so maybe it's just an oversight. But I'd like to see a much clearer and more unambiguous statement of their intent to irrevocably commit Dataflow to an open source license before touching it.

alexgarcia-xyz · on May 14, 2021

hey author here, you're right, completely missed adding a permissive license. Just added an explicit MIT license with the latest version, thanks for bringing it up!

kragen · on May 14, 2021

That's wonderful! A significant step toward fully automated luxury gay space communism! Thank you!

kragen · on May 14, 2021

It's disappointing to see an expression of gratitude for the expeditious resolution of a licensing hiccup downvoted to -4. I'm left to speculate on why so many people responded by downvoting it as if it were spam.

Possibility 1: they are opposed to full automation?

Possibility 2: they don't think software licensing is a potential obstacle to full automation or sufficiently equitable access to the resulting abundance?

Possibility 3: they just have no idea what fully automated luxury gay space communism is, and lack the curiosity to look it up, and so they react without thinking?

Possibility 4: they hate luxury or gay people, so they object on principle to the vision of the future embodied in the phrase?

Possibility 5: they don't think human-factors improvements in software development environments rise to the level of importance implied by my thanks? (But if so, why are they reading this thread at all?)

I'm curious what could possibly motivate this kind of astoundingly hostile reaction.

thirtyseven · on May 13, 2021

I know "dataflow" is kind of a generic name, but the authors might want to consider that there is already a 7 year old Google Cloud product for running data pipelines called Dataflow.

marcinzm · on May 13, 2021

And a Cloudera project: https://www.cloudera.com/products/cdf.html

And an Azure feature: https://docs.microsoft.com/en-us/azure/data-factory/control-...

And a Spring feature: https://spring.io/projects/spring-cloud-dataflow

rectang · on May 13, 2021

And an entire programming discipline.

https://en.wikipedia.org/wiki/Dataflow_programming

taftster · on May 13, 2021

Came here to post the same comment. Exactly right. There are lots of projects that use the term "dataflow".

To add to this, the name of this product is confusing given the context and usecase shown. I assume "dataflow" to the author means the ability to watch data being rendered on a page?

To "big data" folks (like myself), the term "dataflow" tends to represent the routing and processing of data streams along an information pipeline. Not anything to do with a visual representation of a dynamic notebook.

dataflow · on May 13, 2021

> I know "dataflow" is kind of a generic name

Well that's a bummer. And here I thought I was being very unique :-)

axiosgunnar · on May 14, 2021

I know this sounds very Reddit-like, but

> created: October 15, 2012

nice :D

kragen · on May 14, 2021

They also named their whole platform "Observable", as in, extends java.util.Observable, and the equivalent in any other popular OO language. To disambiguate I normally call it "ObservableHQ", which is their domain name; I don't know what to do to disambiguate "Dataflow". "Asg017dataflow"? "Garcia Dataflow"? "Observable Dataflow"? "Issue 9 Dataflow"?

Still, much more significant is the fact that it seems to be such terrific software that this is a discussion worth having, because it's going to be very influential.

keeganj · on May 13, 2021

I'm not a data scientist, but I've been interested in the idea of a "code notebook" ever since Jupyter hit it big. I write mostly in JS/TS for application logic, so this looks like it could be really useful.

Related, does anyone have any recommendations of a (Postgres) SQL "notebook"? I don't really need any visualizations, more just a markdown integrated doc that allows me to lay out the different queries I use to answer a question.

javierluraschi · on May 13, 2021

For viz/DS/ML/AI with JS/TS is either observablehq or and IDE with custom extensions; this project looks relevant if you are already into observablehq.

Shameless plug, we are building a few tools for JS to narrow down this gap as well: - https://hal9.ai (Drag&Drop / IDE) - https://marketplace.visualstudio.com/items?itemName=Hal9.hal... (VSCode extension) - https://observablehq.com/@javierluraschi/running-nodejs-in-o... (ObservableHQ extension)

Would love to chat if you are interested in providing feedback, I'm in javier at hal9.ai. Cheers.

natrys · on May 13, 2021

Emacs and Org-mode has great integration with multiple SQL implementations including Postgres (via org-babel). Org-mode tables are pretty neat, and you can have query result directly populated into tables. Read this blogpost if you are interested:

https://fluca1978.github.io/2021/01/18/PostgreSQLLiteratePro...

Hasnep · on May 13, 2021

Rmarkdown notebooks can contain SQL chunks, so you'd only need to use R to configure the connection. [1]

[1] https://bookdown.org/yihui/rmarkdown/language-engines.html#s...

keeganj · on May 13, 2021

I didn't know you could write SQL directly in Rmarkdown like this, very interesting. Thanks!

pbowyer · on May 13, 2021

Same, when I've read the docs I've always got the impression that it was R only supported.

PuddleOfSausage · on May 14, 2021

There are loads more supported via knitr. Scroll to the top of that linked page in this thread for the list.

amcaskill · on May 13, 2021

I am working on a SQL-in-markdown reporting tool called evidence.

It’s feels like a markdown doc that runs SQL.

https://evidence.dev/

keeganj · on May 13, 2021

This is almost exactly what I was imagining. Just subscribed to updates, very interested to see what this becomes!

simonw · on May 13, 2021

Weirdly my Django SQL Dashboard project may fit the bill a bit here: you can build up a "dashboard" (which is a tiny bit notebook-like if you squint at it the right way) with multiple SQL queries on it, and save that either as a bookmark or as a "saved dashboard" with a URL.

https://django-sql-dashboard.datasette.io/

In my own work I've been using it for the kind of things that I would normally use a Jupyter notebook for - gathering together research on problems I'm trying to solve.

keeganj · on May 13, 2021

Interesting take, I'm not deep in the python ecosystem, but this looks like it's lightweight enough to function as a refreshable notebook. Will give this a try, thanks!

qbasic_forever · on May 13, 2021

I like the ipython-sql magic in Jupyter: https://github.com/catherinedevlin/ipython-sql Depending on what you're doing you might be able to get away entirely with just using it and some basic queries, i.e. no python glue code in the notebook at all. But worst case you might need a cell to open up the DB connection and make the magic aware of it, then you can execute clean and simple SQL queries in cells using the magic.

thejosh · on May 14, 2021

Yeah ipython-sql is great and works well, and can use an environment variable for the connection string.

tlarkworthy · on May 13, 2021

https://observablehq.com/@observablehq/databases

okennedy · on May 13, 2021

It's based on Spark rather than Postgresql directly, but I'm part of an effort to build a workflow system disguised as a notebook callled Vizier [1]. SQL is a first-class primitive in Vizier, and the notebook plays nice with postgres (you can load from and unload to postgres using Spark's native data loader).

[1] https://vizierdb.info

RocketSyntax · on May 13, 2021

Lots of jupyter magic `%` commands for that already https://www.datacamp.com/community/tutorials/sql-interface-w...

sixdimensional · on May 13, 2021

Apache Zeppelin is one open source option - https://zeppelin.apache.org.

gradys · on May 13, 2021

Maybe just a Python notebook with a Postgres client library and some helper functions to keep the amount of Python in the main body to a minimum?

shapiromatron · on May 13, 2021

re: sql notebook, this came up a few months ago and worked great when I played around with it: https://blog.jupyter.org/an-sql-solution-for-jupyter-ef4a00a.... It's just a different kernel you can install to an existing jupyter instance.

robertlacok · on May 13, 2021

Deepnote has native Postgres cells :) you can mix them with Python too.

Disclaimer - I work there :)

Siira · on May 13, 2021

org-babel should fit the bill.

d--b · on May 13, 2021

I am also working on an alternative: https://www.jigdev.com

It’s the same idea except that cells are spread out on a 2d canvas with tabs similar to excel.

mistidoi · on May 13, 2021

As a total Observable/Bostock stan who works with HIPAA protected data, I love this.

Galanwe · on May 14, 2021

Does anyone know of a good reusable jupyter front-end?

I have a farm of jupyter kernels that I can run on demand, and would like to integrate a UI for these kernels on my React website.

I've had a look at the Jupyter default UI but it uses Luminos components which are basically not compatible with React.

Also had a look at nteract components, but their projects seem dead.

Anyone working on something similar? Clean react components to act as UI for the jupyter protocol.

nautilus12 · on May 13, 2021

I see all these notebooks products and I honestly don't know how any of them plan to compete with AWS...no body wants self hosted anymore, everyone just wants to pay AWS or databricks for it.

Can other people chime in? Maybe i'm just working at the wrong place.

simonw · on May 13, 2021

https://observablehq.com/ is a cloud hosted platform already.

This thing - Dataflow - is an open source run-on-your-own-machine alternative to the official Observable hosted solution, taking advantage of the fact that Observable itself is JavaScript code with some special sauce that's available as open source runtime/parser libraries.

qbasic_forever · on May 13, 2021

It's running on localhost here and I presume that's their intended use case for this feature. Localhost is critical for development--imagine if VS code wouldn't work unless you were connected to Github.com. This is fixing that issue with observable notebooks so now you can run and develop your notebook locally without depending directly on the internet or their cloud service.

nautilus12 · on May 14, 2021

I'm increasingly seeing companies embrace "dont develop locally". Personally the idea of having to sign into AWS console to develop makes me cringe, but i'm seeing more people just be ok with it.

simonw · on May 14, 2021

Having been responsible for the shared local development environment system at a 100+ engineer company I can tell you exactly why: the amount of time and money wasted fixing individual developer environments is astronomical.

If someone's environment isn't working, having a button they can click to get a brand new working one in the cloud is an enormous time-saver.

nautilus12 · on May 20, 2021

I guess it's comparable to having a corporate uniform or getting to wear what you want

FormFollowsFunc · on May 13, 2021

I've been looking for something like this for data vis exploration. Compared to Observable accessing local data files is more convenient. Currently I use a Jupyter notebook along with Pandas and Matplotlib. I'm not a huge fan of Matplotlib so I would prefer to use Plot or Vega Lite API and Pandas could be replaced with Danfo.js or Arquero.

RocketSyntax · on May 13, 2021

Help me understand what the page being rendered is doing. Is that like an interactive app you are serving for user input?

qbasic_forever · on May 13, 2021

It's an observable notebook: https://observablehq.com/ Basically a notebook where you write JS code and see the results immediately rendered in the notebook. In this case it's being served locally instead of requiring you to use their service website. If you've ever used Jupyter or IPython this is very similar (code notebooks) but with some interesting changes in philosophy and more of a Javascript implementation instead of python.

What might be tripping you up is that in this demo the observable notebook isn't showing the code cells, only the outputs. The code is in the editor on the left and the output on the right is the result of running the code as an observable notebook. In some ways it is like a simple interactive web app.

Isthatablackgsd · on May 13, 2021

Is that similar concept to Overleaf for LaTeX?

kragen · on May 13, 2021

LaTeX doesn't really support interactive data visualizations or reactive re-rendering, and it's a pretty difficult environment to do things like read a CSV data file and do a linear regression in. Observablehq is closer to Jupyter, Excel, R Studio, Octave, or Tk than LaTeX.

Isthatablackgsd · on May 14, 2021

I'm sorry that I didn't specify. What I meant of a "live WYSIWYG" like Overleaf compile tex on the fly. I hope I am clear of what I meant. Or I am misreading you?

kragen · on May 15, 2021

It's kind of like that, but LaTeX compiles to an image you can print out on a sheet of paper. By contrast, in ObservableHQ, or in Gnumeric, you can make a reactive document that responds in real time to something like dragging a slider.

https://observablehq.com/@kragen/plotting-with-a-slider is a really simple example, slightly tweaked from one of Mike Bostock's; you can drag the slider to change the parameters of the WebGL plot displayed below it. This gives you a much faster feedback loop than editing and recompiling in Overleaf. It's way more live than Overleaf.

Also, the document lives at an URL and you can link to it with different parameters. So instead of just being a static document, it's a program that takes its input from the URL, interacts with the user, and lets the user save their work as a URL and send it to somebody else, like https://observablehq.com/@kragen/plotting-with-a-slider?f=si... for example.

And then, as I said, if you want to do a linear regression in LaTeX, you are going to be a very sad panda very soon, because it's possible.

qbasic_forever · on May 14, 2021

Yeah that's the core idea with all code notebooks like Observable or Jupyter, etc. You get instant feedback and results right in the notebook. So you type some code, usually press a button to run, and instantly see the results right next to the code. It would be kind of like if your traditional IDE had a run button that compiled, ran, and dumped the output of your in development program as comments right below the lines you're editing.

RocketSyntax · on May 14, 2021

omg. i had no idea observable was js-focused. i always thought it was another R/python competitor.

chrisweekly · on May 13, 2021

OK! I can't put off creating an observablehq acct any longer.

... Done. Stoked to dive in this weekend!

lejohnq · on May 13, 2021

This is pretty awesome. Feels like a streamlit for the javascript world.

whoevercares · on May 13, 2021

How does this related to data flow or it’s just a brand name