Hacker News new | past | comments | ask | show | jobs | submit login
Netflix open-sources Polynote, an IDE-inspired polyglot notebook (medium.com/netflix-techblog)
427 points by type_enthusiast on Oct 24, 2019 | hide | past | favorite | 85 comments

I just find reproducible notebooks at the internet. It is really rare to find them from coworkers. If they aren't trained as developers, it is almost impossible. Their solution for this problem looks really efficient and is really simple and brilliant:

> Writing Polynote’s code interpretation from scratch allowed us to do away with this global, mutable state. By keeping track of the variables defined in each cell, Polynote constructs the input state for a given cell based on the cells that have run above it. Making the position of a cell important in its execution semantics enforces the principle of least surprise, allowing users to read the notebook from top to bottom. It ensures reproducibility by making it far more likely that running the notebook sequentially will work.

Thanks for the kind feedback. It's a young project to be honest, but I'm pretty proud of what we've done with only two contributors so far. With community participation I think we could support many more languages pretty quickly!

I really have always wished for reproducibility. Thanks for taking up this feature. How do you handle aliasing and references inside objects? Suppose I have

    #Cell 1
    a = [1,2,3]
    b = (a,True)

    #Cell 2
    b[0][0] = 5

    #Cell 3
Now if I change Cell 2 to

   # Cell 2'
   b[0][0] = 4
and execute, Cell 3's result becomes stale. Do you track such dependencies? Would really love to read more about the underlying implementation.

If you mutate an object itself, we can't really track that. There's no magic going on; you can break the state if you use mutable objects. It's less of an issue in Scala where immutable data structures are the norm, but I can imagine it would be disappointing in Python.

Currently it takes a shallow copy of the state output by each cell, meaning every value is going to be a primitive value or a reference. If it's a reference to mutable state, you're kind of on your own with respect to keeping reproducibility. I felt like this was a good compromise between strictly enforced reproducibility and practicality; if it turns out to be confusing we could consider deep copying the state, or having an option to do that (I could imagine it being pretty bad for efficiency in a lot of ML use cases, though).

I am not familiar with those notenooks. What would be wrong with re-executing all the cells below the one that changed?

That is usually a feature. The reason it's not the default everytime you change a cell if that cells can contain long running calculations.

You could also try Voilà to make reproducible standalone app from notebook https://blog.jupyter.org/and-voilà-f6a2c08a4a93

According to the article, the most interesting feature compared to Jupyter is no hidden state - if you delete a cell, the variables it set are gone. Also, you can mix languages - you'll be able to access variables filled by prevously executed cells in another language.

Personally, I'm looking forward to trying out the SQL support. I haven't seen an elegant solution for SQL notebooks in Jupyter, it was always second-class via Python or some such. Or have I missed something?

> Also, you can mix languages

Interesting. Judging by that it seems to be implemented with a JVM language and a screenshot shows "Scala" as a supported language, I'm guessing at least all the JVM languages are supported (personally hope for Clojure) but can't seem to find a list of supported languages anywhere in the post or on the website.

What languages are supported by Polynote?

Currently just Scala and Python (via jep). Looking to add more (probably starting with Java and clojure) but haven't had time yet. There's just two of us working on it so far. PRs welcome!

Have you looked at GraalVM to bring in more languages?

Yep, we will be adding a plug-in to support Graal languages (little bit of learning to do on Graal first). We didn't want to make the project depend on Graal, though, because it's a pretty small segment of users (and we're not using it on our team at this point).

To verify, will this plugin allow external parties to add support to Polynote for more esoteric languages?

I don't want to over-promise anything given that I still have some reading to do about Graal's inner workings. But even given the interpreter side, there's also some plumbing to do (e.g. Monaco integration) that we haven't thought out yet. It's still in its infancy, and we'll need some help to be able to add stuff like this.

Get in touch with me if you want any help - I have experience in embedding Graal.

Looks like just Scala and Python right now:



also SQL

> Currently Scala, Python, and SQL cell types are supported.

The SQL support is done through Spark, so it's not particularly novel – Zeppelin for example supports SQL similarly. We've talked about adding a more general SQL interpreter, though. Happy to hear any suggestions about it!

Do you know of any generalized SQL interpreter that allows push-downs to the underlying engine where possible, but can also arbitrate compute resources to post-push down operations. Eg: such as merging disparate result-sets or make up for the lack of features from the underlying engines.

Closest thing that comes to mind is something like Apache Drill, which coincidentally also uses Apache Calcite as the SQL interpreter.

Also wondering why I would use this over Zeppelin which can support other interpreters like Flink?

R notebooks do a lot of this stuff reasonably well, too. But then you have R as the primary language (other langs are well supported though).

As ever, the best answer is the Notebooks Are Bad, Actually

If anyone would like a docker image, I created one today: https://hub.docker.com/r/greglinscheid/polynote https://github.com/Vilos92/polynote

I like this as a concept, but the JDK / jep requirements are a bit of a turn off, personally... I understand they want it to speak Spark but that's not exactly how I would imagine it worked from the name or the "polyglot notebook" description

While the reproducibility problem is definitely a issue, I'm not sure it's such a big issue that I'd switch to a whole different notebook solution for it. For most notebook scenarios, running from scratch works fine to ensure it reproduces. Apart from this one feature, BeakerX does all the same things and fits a lot better into the existing jupyter ecosystem.

To be clear, we're not out to supplant Jupyter. Anybody who's happy with their Jupyter setup will likely find little value in Polynote. But it has plugged some gaps we've had in our Scala ML research team at Netflix, so we thought others might see some value as well.

And there are lots of teams at Netflix that are investing in Jupyter as well! Plenty of room for both options.

Somewhat off-topic, but what's with the lambda replacing the "n" letter? I'm no expert in Greek but I thought lambda was the equivalent to the letter "l"...

The logo was hastily designed by an amateur (me). I figured most people would figure it out, pedantic people would complain, and we'd all have a good time :)

We've had some better options contributed in the past couple of weeks, but as long as we're going to change it I didn't want to rush that. So we stuck with my questionable typographic treatment for the blog post.

(Edit: autocorrect typo)

Atm it reads 'polilote' in Greek. You might want to substitute 'λ' for ΄ν'.

Right? It would work as Poλynote.

See, we tried that, and to me it just looked like "ponynote". So far everyone who's mentioned "polylote" has been a current or former physicist, so maybe there's an interesting correlation there...

I'll toss in a vote to add support for https://www.ponylang.io/ and embrace "ponynote" :)

(this looks fantastic btw, I'm definitely going to explore it more in depth. thank you a TON for releasing it!)

Some might argue "ponynote" is a great nickname... ;-)

Also not a physicist FWIW <3

How about Polλnote?

I got this: Poλλλote.

I did the upside-down lambda with the right-side-up lambda because I thought it had a neat yin-yang look to it. Probably should have thought about how it would read to someone who reads Greek :picardfacepalm:

i mean thats how lisp had it, youre probably fine :)

Perhaps the product name is actually pronounced "Polllote"

You're more than welcome to pronounce it that way! :)

It looks like the editor this uses is Monaco, the editor in vscode, that’s pretty cool.

It does! Monaco is one of the many awesome open source libraries that made Polynote possible. We'll be discussing that at Scale by the Bay; check out our talk if you're going!

It seems like the tool was mainly invented to deal with the issue of hidden state in notebooks, but I don't honestly see what the big deal is. Jupyter notebook is a tool with hidden state being a gotcha that you can learn how to deal with extremely quickly. I've been a Jupyter notebook for several years so haven't had this problem often in recent memory, but I've led workshops where we teach users how to use the notebook. Inevitably hidden state issues come up, but students very quickly learn that restarting the kernel is a necessary part of the workflow and figure out when they need to do it.

If anyone on the Polynote team is lurking: curious if this is a successor to the great work done by NTeract, which you funded (thanks!)

That project experimented with a lot of interesting themes I see echoed here.

It's not a successor; nteract is a separate project (part of the jupyter ecosystem) and is alive and well. Polynote was started mainly to support use cases of our Scala-based ML engineering teams. It's a little bit apples and oranges.

I have a love hate relationship with how R studio deals with hidden state in notebooks. If you want to export an .rmd file to pdf, you have to run the whole notebook from start to finish in order, sorta proving that the thing is reproducible before export (maybe there is some technical reason as well)

It's nice know that your report actually worked, and is not showing something odd because of a hidden state, but sometimes you just want to print the darn thing now!

I need to resist the urge to package this as a standalone app. I don't really like the idea running a separate server and having an editor tied to a browser, but wrapping everything in an app bundle with WebKit views seems like a nice side project.

Why resist? That would be awesome!

The server use case is real, though. Users typically like to run it on a beefy cloud machine with access to a Spark cluster.

There is already a tool to make usual notebooks to standalone apps with all needed source data inside - Voilà https://blog.jupyter.org/a-gallery-of-voilà-examples-a2ce7ef...

I wish someone would make Jupyter alternative, but native, without the need to run webbrowser, css and bunch of JavaScript just for simple rendering task. Something based on Qt, GTK, or anything native and crossplatform.

A couple of years ago I would have totally agreed with this, and I had several abortive attempts to do something like that before starting Polynote. The problem is, it turns out that a notebook really needs to be able to display a bunch of really heterogeneous rich output. There's only one pre-baked way to support that, and it's HTML. So you can either embed HTML in your UI, or embed your UI in HTML. At least on the JVM (and at least for me), it turned out to be easier to do the latter than the former.

These are not native.

Sorry, but this there IS been a long time ago. And the name is Emacs ORG-Mode Literate Programming :))

Gave this a try and it looks very promising. It would be great if GraalVM was integrated to extend the polyglot support to JavaScript, Ruby, R, in addition to Java, Groovy, Kotlin, and Clojure.

We didn't make graal a dependency, but we are absolutely planning to support graal languages in a plug-in. It's early days, though.

Maybe I'm not the right audience but why would notebooks need to have no hidden global state, and be reproducible? I personally use notebooks as a way to jot down things I would've tried in a REPL. Notebooks aren't meant to hold pieces of software; they are a dump of my explorations. I have a hard time understanding some of the requirements that went into the design of Polynote.

Because principle of least surprise is a good thing for almost any application?

> Notebooks aren't meant to hold pieces of software

For you they are not meant that way. But other people started using them that way. Netflix is a well known place for this workflow.

It's hard to understand how someone would want to share their explorations with others?

So interesting to see something like this right after vscode added jupyter notebook support this past month, which I was excited to see given how poor the editing experience is in standard notebooks, especially around intelligent autocomplete.

There are lots of cool developments in IDE notebook support - IntelliJ just dropped a plug-in for it as well. To be honest I'd be thrilled if an IDE solution could fill all the gaps that Polynote's targeting (our work would be done!) so I'm looking forward to seeing what develops.

Liking that built in data visualization editor... That be super cool in any SQL IDE.

Another somewhat off-topic comment: Someone please do this for stock prices / SEC data / financial modeling, with the ability to output into PDF (or PPTX) and you will conquer the world.

What do you mean? Jupyter notebooks can already work with any kind of data (including fetched over the internet), and can already output to PDF.

Are these the outputs that you'd have to generate because your client has to submit reports to the SEC? Or am I misunderstanding?

No, these are outputs that the client pays us to generate for them. These specific instances that I linked were from an example that was made public and then filed with the SEC, but usually these decks remain private

So you are talking about type of Notebook for Inline XBRL data rendering?

This isn't exactly XBRL as the calculations are all bespoke, done for the purposes of these specific presentations

My vision is separating financial analysis from formatting, or bringing CSS / markup languages into the world of Excel and sidestepping PowerPoint entirely if possible

PowerPoint is a slide presentation tool that is sadly used for authoring so-called "books" with lots of formatting in Investment Banking, Management Consulting and Corporate America at large... and its shortcomings become immediately obvious

As we speak (!!!), I'm having to reformat a book from our standard template into the client's template by manually resizing charts, recoloring series, etc... literally wasting hours of my time because we use Excel + PPT

Wouldn't that mean clients would have to give you templates in CSS / markup? Or would you maintain different templates for different clients?

And then the task would just be mapping the data generated to the templates, mad lib style?

Client templates happen, but they aren't the norm. I pointed that example out as one egregious waste of time, but simply conforming Excel charts / tables to the company's format is very time consuming on a day-to-day basis.

A lot of people just don't have an intuitive sense for design or plain don't care, so you're left with a high variance in the quality of the work that is produced. Speaking from my banking experience, the result is senior bankers have to spend time marking up documents on things like formatting, and junior bankers feel frustrated because they waste their productive hours on non-productive work. And it's generally a frustrating experience indeed as the tools we use aren't built for this purpose.

But formatting is just part of it. "Code" reusability (more like financial model reusability) is basically zero, auditing is a pain and people use many different approaches to do the same things, but again with varying levels of efficiency and accuracy. I can tell you 10 ways to calculate Total Shareholder Returns for a public company using the FactSet add-in in Excel. Also several ways to pull Revenue and EBITDA figures, and these aren't even the more esoteric metrics like Free Cash Flow or Funds from Operations.

These decks that we prepare should work like recipes. Input the ingredients and out comes the prepared dish, but everything is so god damn manual

Not to mention issues with Excel itself, like how Excel's (non-cascading) styles and defined names (similar to variables) propagate to other workbooks if you copy-paste content from one to another, which in the long-term creates files full of garbage that crash and corrupt frequently

The whole paradigm is a shit show. One day it will be different, I am certain. But nobody has taken a holistic, multi-disciplinary view at the problem because bankers / consultants don't know what is possible with today's technology and the ones trying to address these usability issues are only looking at a couple pieces of the puzzle at once

I was curious about this problem so I went digging around.

As I understand it, financial companies often want to gather data from multiple places but consolidated in a digestible form to make financial decisions. (what kind of financial decisions, I'm unclear on, since most trading is done by computers now, but maybe this is for ETFs or OTC trades) And they're use to looking at it on PDFs and Powerpoint, because that's what people email around, and no one trusts having financial slides on the web. (why, btw? In case they leak?) And because you have many clients that want the same or similar sort of analysis on publicly traded companies, you'd ideally be able to change the analysis once, and then generate all the PDF and PPT reports they want to see.

It does seem like a giant waste of time to cut and paste data from excel into powerpoint by hand. However, you should be able to export Excel data to Power Point via the Visual Basic Editor. (https://www.wallstreetmojo.com/vba-powerpoint/) Do people not use VB to prepare this?

I don't get the impression that analysts in finanace would be willing to move from Excel to a notebook. Do you get a different sense? What would a notebook have to offer to get them to switch? Analysts generally seem to love Excel, with the exception of the slow first load and crash.

> As I understand it, financial companies often want to gather data from multiple places but consolidated in a digestible form to make financial decisions.

Yes, this is spot on.

> (what kind of financial decisions, I'm unclear on, since most trading is done by computers now, but maybe this is for ETFs or OTC trades)

Sales & Trading is but one part of banking. Yes, a lot of S&T is automated, but long-term strategy isn't defined by computers and neither is pitching to win new businesses. Besides S&T, there's also Restructuring (advisory and financing) and Mergers & Acquisitions. My opinion is written from the perspective of an M&A banker. I probably make ~3-4 PPT books every week on average.

> And they're use to looking at it on PDFs and Powerpoint, because that's what people email around, and no one trusts having financial slides on the web. (why, btw? In case they leak?)

Concerns over leaks is certainly a driver, but traceability and auditing also play a role. 3 years from now, I can certainly retrieve info that was attached to an e-mail but a link may have long expired. Also most recipients are over the age of 40 so might not like using links in general

> It does seem like a giant waste of time to cut and paste data from excel into powerpoint by hand. However, you should be able to export Excel data to Power Point via the Visual Basic Editor. (https://www.wallstreetmojo.com/vba-powerpoint/) Do people not use VB to prepare this?

Certain tools like the add-ins provided by FactSet, S&P's Capital IQ and, less commonly, Bloomberg, export data into PPT with some metadata attached to it that allows you to refresh content quickly (in theory). It's all built as plugins on top of MS Office apps so your experience is not always smooth. Plus they don't solve the bigger issue of reusability

> I don't get the impression that analysts in finanace would be willing to move from Excel to a notebook. Do you get a different sense? What would a notebook have to offer to get them to switch? Analysts generally seem to love Excel, with the exception of the slow first load and crash.

I think Analysts like spreadsheets with very responsive UIs and countless hotkeys. Muscle memory is a major thing in this business.

My vision of a solution would be one that still implements spreadsheet-like functionality but does so in pieces that connect together all the way through publishing via an integrated paradigm

And to be clear, such paradigm could sit on top of Excel. It's the tooling around it and the workflow that I think will be solved. Where the boundaries of what a new app / system lie exactly is open to debate, however

As someone with some amount of influence over the direction of Excel, I'd love to sit down with you to better understand this scenario better. Would you mind if I contacted you off-HN? My contact info is in my profile. Thx!

Sure! Thanks for reaching out, it will be an absolute pleasure to discuss this further. I'll ping you from my work e-mail

Who would you talk to that would buy? Where do you find them?

Investment Banks, but maybe sell the functionality to a large provider like FactSet or Capital IQ who already have their packaged software licensed to big financial institutions

Alternatively build something more tightly integrated to MS Office and sell it to Microsoft

I was really intrigued to see a tool designed to help people learn languages. I was thinking spoken languages, though, not coding languages.

Does it have some sort of version control? Or maybe the internal representation can be cleanly stored in Git?

The notebooks are just .ipynb files (Jupyter's format, though apparently it doesn't like our notebooks very much...). So you can certainly store them in git. We don't have integration yet, but it's on our roadmap.

Good idea here. If only Netflix would move to a 100% F/OS stack without proprietary WebDRM.

This looks interesting

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact