Can you make a pitch to a Python/R user to give this a try?
What you’ve built looks very nice and heard nothing but good things about elixir elsewhere, but would take a lot to leave those much more robust ecosystems. Do you hope to grow into that over time? Is there enough in terms of viz, statistical models, and ml to survive?
José from the Livebook team. I don't think I can make a pitch because I have limited Python/R experience to use as reference.
My suggestion is for you to give it a try for a day or two and see what you think. I am pretty sure you will find weak spots and I would be very happy to hear any feedback you may have. You can find my email on my GitHub profile (same username).
In general we have grown a lot since the Numerical Elixir effort started two years ago. Here are the main building blocks:
* Nx (https://github.com/elixir-nx/nx/tree/main/nx#readme): equivalent to Numpy, deeply inspired by JAX. Runs on both CPU and GPU via Google XLA (also used by JAX/Tensorflow) and supports tensor serving out of the box
* Scholar (https://github.com/elixir-nx/scholar): Nx-based traditional Machine Learning. This one is the most recent effort of them all. We are treading the same path as scikit-learn but quite early on. However, because we are built on Nx, everything is derivable, GPU-ready, distributable, etc.
Regarding visualization, we have "smart cells" for VegaLite and MapLibre, similar to how we did "Data Transformations" in the video above. They help you get started with your visualizations and you can jump deep into the code if necessary.
Jose's reply suggests the basics have Elixir equivalents. I can't really speak to that side but I can say the usability story is much much better.
The last time I gave Jupyter notebooks a go it was a full session of installing and updating various Python tools: pip, conda, jupyter then struggling with Python versions. You end up piecing together your own bespoke setup based on other people's outdated bespoke setups you find while searching for your error messages. Maybe that's better now, this was a few years ago. For Livebook it's "download the app and run it." Other options exist and are well documented and straight forward. I set up a livebook server on our k8s dev cluster with a pretty simple Deployment I wrote just from looking at the livebook README notes on docker. We've made livebooks that connect to the elixir app running in a different namespace on the cluster. Very cool.
Once you have Livebook going the `.livemd` file is both version control friendly AND very readable markdown file rather than the big json objects used in `.ipynb`.
For Livebook rebuilding cells is a lot more repeatable. It also does a good job of determining if a cell re-execution is necessary or not if a previous cell is modified which can save you a lot of time. Likewise the dependencies installed are captured at the top so I've never had a problem when sharing a livebook. The other person always gets the same results that I had. I don't remember how it worked for Jupyter but it's really cool to collaborate with someone by both going to the same notebook session. It's like working on the same Google Doc but you are writing and executing code.
Now with the Publish functionality I can see using a livebook to throw together some functionality and share it with non-technical users in your org, while having it backed up to git for posterity.
I avoided Smart Cells for a while because I didn't like the "magic-ness" of the UI hiding what the code was doing, but as Jose has shown in the launch videos this week you can easily see the code they are backed with and replace the cell with the code if you want to take full control. Maybe it was always like that but I didn't realize it at first. They really make setting up stuff very easy without limiting you later on.
I'm pitching it to the data science department of the company I work for (huge insurance company in my Country) next week.
They do a lot of prototyping from CSV/parquet sources in Python and R.
I've waited to show them Livebook because Elixir syntax is somewhat alien to many, but now that the Livebook team has integrated ML models and dataframes (Explorer through polar.rs) as smart cells, I think they have a killer feature in their hands, much like Liveview was for Phoenix framework.
Let's see how it goes, I'm fairly optimistic about it.
When José first launched LiveBook I thought it was Ambitious to take on the Jupyter project, but here its getting better and more usuable every single day.
Huge props, the Livebook Project is an incredible example what is possible with Elixir.
Once again, Elixir community takes something and brings it to the next level.
In my opinion Jose Valim is the Linus Torvalds of programming languages, but differently from Linus not only he is a 100x engineer, he's also one of the humblest and kindest person I've ever met (don't get me wrong, I love Linus, but he can be "too honest" at times and comes out as harsh or rude).
From Elixir came Phoenix, from Phoenix came Liveview, from Liveview we got Livebook, iterating from good ideas to quality products like it's an easy task.
Can't wait to see what's the next trick in their sleeve.
IIRC he was behind the very popular Rails authentication gem Devise, as well. Really unbelievable how much of a boon he’s been to the open source community.
The dude is superhuman. An absolute machine in terms of programming output. Very engaged with the community. And extremely patient with people who have wrong opinions :)
S3, S4, R6, and reference classes. To be fair they are situational and not one size fits all. The stricter ones are mainly used in biostats where significant metadata makes more sense in OO. S3 is nice and easy, primarily just a list with dispatches. Everything else is less so.
The coercion always gets on my nerves, JavaScript gets a bad rep but R is pretty damn warty too; weird ass data types ('ordered factor', anyone) that just seem so very far away from design choices in other languages without being particularly ergonomic or aesthetically appealing
I remember when we first used R in a stochastic class. The professor (a mathematician) was in love with the language and the students (computer science) considered the language to be the PHP of science.
as a computer scientist and programming languages nerd, I think R is a much better language than Python (comparing the two only because Python is leading in the data science field)
I also believe that the tools available are superior, RStudio is very good IMO.
Because it’s built around a very specialized set of needs (data manipulation, visualization, and statistical modeling), and it is essentially best in class at it, but it has quirks as a result. Anyone coming to R from a background in another language will feel those quirks intensely and assume it’s bad.
Javascript and C also have weak typing. And Python isn't statically typed, it's just that like Ruby, it doesn't allow implicit conversion, except with numbers.
Extremely cool! This is the first time I'm learning about Livebook's smart cells, and seeing how easily you can toggle between the UI and the underlying chain of dataframe operations in code is pretty mind-blowing.
Elixir runs on the Erlang VM, which is designed for distributed multi-process concurrency, right? Why would this be advantageous for interactive data analysis work, which is typically done on a single node? I don't quite understand the use case, hoping someone can explain.
Uses cases that comes to mind are data distributed across nodes and things like distributed training of machine learning models (which is getting more and more focus as models get bigger).
First, to makes ure we are on the same page, distribution in Erlang happens across nodes and concurrency happens within a single operating system process. Erlang calls its concurrency primitive processes (because they are also isolated and preemptive) but that can cause some confusion (hence this comment).
From now on, when I mention process, I mean the Erlang VM processes, and they are very lightweight and you can create millions of them.
I can think of a few different ways where concurrency can help interactive data analysis:
1. Livebook supports rich outputs where each output is a process. This means your notebook can communicate with outputs as it executes. For example, it is very easy for you train a neural network and push data to the graph as it comes. Or to process data and plot it as you go.
2. You can use concurrency to run several experiments at once within the same notebook. We support this in Livebook via "Branched sections". You can prepare the data and then start several branches/processes to digest the data in different ways without a need to start several notebooks.
When it comes to distribution, it is quite similar to above, because the concurrency and distribution primitives in the Erlang VM are the same. Here is an example of how easy it is to take a ML model from concurrent to distributed: https://news.livebook.dev/distributed2-machine-learning-note...
Generally speaking, I think we should start from the opposite side: we should try to make everything concurrent by default and fallback to serial only when we cannot. Specially for data analysis, where moving data is expensive, we may end-up incurring a lot of overhead if the only form of concurrency is via the network or inter-process communication.
One last note, perhaps the most important bit of the Erlang VM machine for data analysis is that it favors a functional style. Livebook notebooks are strongly reproducible. I expand on this in this video: https://www.youtube.com/watch?v=EhSNXWkji6o
I hope this helps (and feel free to tell me if I missed the mark!).
PS: I know many of the videos above are machine learning related and that's because we have started our data journey only now. Although the principles should generally apply. Hopefully more data videos will come soon! :)
Languages and runtimes often grow beyond their original scope. And since the introduction of Dirty NIFs to the Erlang VM five years (or so) ago, integrating with native code (which is what powers a lot of data analysis and machine learning tools in high-level languages) has become a real possibility. There is a similar-ish discussion here: https://news.ycombinator.com/item?id=35572128
The video overview says the journey in data is "just starting". That's exciting! Any ideas or vision for where it's going in the future?
Also, I noticed in the demo that installing polars was very fast! Was the video trimmed, or was it cached or something? I remember when I last tried out polars in Elixir a while ago, it had to build the rust library and everything and took like 10 minutes.
Things that are in our roadmap in relation to our data vision:
* Data management within your notebook (https://github.com/livebook-dev/livebook/issues/1604) - we want you to be able to link files, urls, and object storages to your notebook and automatically manage/download it
* We want to make it easier to build visualizations (even easier than the current Chart smart cell) and also be able to filter a dataframe by selecting a visualization: https://github.com/livebook-dev/livebook/issues/1545
We have other ideas, such as making SQL a more prominent citizen in Livebook and be able to build a custom canvas as your work on your notebook, but those will likely take longer to realize.
A vote here for a SQL cell. I want folks to use Livebook and Explorer more, but a very easy win for data folks who are not familiar with Ecto and are mostly writing complex select statements would be a sql code block that can easily reference a connection.
That would let people who are getting into Elixir for data work run a query, get an Explorer.DataFrame, and interact further that way.
Livebook already has a SQL Smart cell (https://livebook.dev/integrations/sql). It doesn't integrate with Explorer yet, but it's already possible to reference a database connection, run a SQL query, and visualize the results in a table.
We’ve been investing a lot in making Elixir great for data exploration.
Today we’re taking one step further in this journey by contributing to the Explorer library and integrating it with Livebook.
Explorer is an Elixir dataframe library built on top of Polars (from Rust) and inspired by dplyr (from R).
Its integration with Livebook (open-source code notebook for Elixir) makes it easier to explore and transform dataframes interactively.
Let me know if you have questions about these new features or anything related to Livebook’s launch week. :)