Hacker News new | past | comments | ask | show | jobs | submit login
Using Altair for most of my visualization in Python (fernandoi.cl)
290 points by cuchoi 6 months ago | hide | past | web | favorite | 73 comments



I'm partial to plotnine (https://github.com/has2k1/plotnine). It is an implementation of grammar of graphics in python (like ggplot2).

It provides a stateless layer like interface which feels intuitive, and avoids bugs due to global state.

But, it is built on top of matplotlib. So, if plotnine cannot do something, you can go back to the matplotlib way of doing things and take advantage of the large number of blogs, example, tutorials, and stackoverflow answers.

Performance wise, it can handle more data than altair. But less than matplotlib, if you use the built in data manipulation features.


Altair builds on Vega-Lite but if you cannot do something in Vega-Lite, you can do it in Vega.

Performance is a huge concern for us and we are working on some improvements in that area. We will first focus on pushing computation and aggregation down into the Python kernel.


The problem with all these nice new visualization libraries for Python, is that they all (at least the shinny nice ones) fail totally short when it comes to do B&W graphics for journal publications. Things like filling patterns, line patterns etc, are mostly missing.

I still use Matplotlib and I can make it look beautiful and exactly how I want... it's just a lot more work to get the shinny bits.


For my thesis, I tried a couple of different options, but in the end the only one that really made publication-grade output was gnuplot with the epslatex terminal. It's a bit fiddly to get it up and running, but hands down the best result I think.

EDIT: Spelling.


The underlying vega library supports overriding styles for color and line properties - it may not be as difficult as you imagine to generate B&W graph outputs for print.


Once you have a web / javascript output, there are a bunch of possibilities:

https://observablehq.com/search?query=stippling

https://observablehq.com/search?query=dithering

https://observablehq.com/search?query=halftone

https://observablehq.com/search?query=crosshatch

(Or go out and look around github or the broader web to find many more options.)


I save the data from python or matlab and use pgfplots to create stunning plots. Nothing I saw in any other plotting lib came ever close to pgfplots in terms of beauty and flexibility.


It is a great package. Two disadvantages are that the interfacte is not very pythonic and that there is no interactivity.


second that. also for the scientific community at large, big portions of "not-so-happy"-matplotlib-users are just using whatever they/their admin installed sometime ago, which probably is outdated and does not include a bunch of features introduced in v3


Plotnine is the best when you are making a plot to put in a pdf or on paper.


This talk from PyCon 2019 is related and may be of interest: https://www.youtube.com/watch?v=vTingdk_pVM


His earlier talk is great too to get the lay of the land...

Jake VanderPlas The Python Visualization Landscape PyCon 2017

https://www.youtube.com/watch?v=FytuB8nFHPQ


> Not great statistical support. I still rely on Seaborn for quick visualization that needs to fit a linear regression.

Why does a plotting library need support for statistical analysis? I always think of manipulating data and visualizing it as orthogonal disciplines.


In the grand scheme of things it's a minor gripe, but I really dislike APIs that encourage chaining of calls like `alt.Chart(data).mark_circle(size=200).encode(...)` from the article. This is found in many other libraries, especially JavaScript ones like e.g. jQuery. I know it's compact, but it makes it harder to see what's going on and hides the fact that all of the operations are actually being applied to the leftmost object.


SmallTalk has the correct answer here, with its method chaining operator `;`, I feel. For cases like this where you build complex state on an object via method calls, it gives some pretty nice code. And since it's a separate construct, the individual method calls can have return values that make sense too, as opposed to this style of interface where the methods all have to return the object again for chaining.


Altair is incredible until you realise your plots are bigger than your data. It's annoying, but overall still very good.


This is something we are painfully aware of and have started to work on. For a start, see Jake's work on https://pypi.org/project/altair-data-server/.


Can you speak more about this? Are you saying that Altair's speed scales poorly with dataset size?


It throws an error if you try to plot more than 5000 datapoints by default, internally altair produces a JSON representation (vega-lite) that is larger than the data you plot (because it contains the data you plot, plus formatting information). If you save this output, for example in an ipython notebook, it gets phat pretty quickly.

Altair is still great though, but this issue makes it occasionally annoying.


Just in case, you can plot datasets larger than 5000. A quick fix is using "alt.data_transformers.enable('json')"


Anyone figured out how to use Altair with VSCode to plot in a separate window ? If I use matplotlib, I can use the show() method to plot in a separate window. I would love to have a similar thing for Altair. I'm fine if there's a working method in Spyder as an alternative.


Me and my team are big tableau users, so now that we are testing Python vega and altair are the natural approach. I am having my struggles to make altair work on jupyterlab (while ipyvega works so I could start playing), but I think it’s worth the effort. The Altair guys are extremely nice and responsive on github which is great too. For those looking for side projects, I think a nice altair gui that works in jupyterlab would be great. Anyway, why this approach is superior? Because once you get it you’ll be amazed the stuff you can do and how easily, also how easy is to train new analysts on this, this is important and matplotlib falls short. But don’t take my word, grab a trial copy of Tableau and see it for yourself.


Plotly Express (https://plotly.express/) is a new high-level wrapper on top of Plotly.py which gives a similar API to Altair in a lot of ways.

Under the hood it works the same way: Plotly.py generates JSON figure descriptions which are passed to Plotly.js for rendering.

The whole thing is, of course, free and open-source and well-documented :)


What visualization library are you using in Python?



Mainly Seaborn and Plotly. Holoviews and Altair are certainly noteworthy:

http://holoviews.org/


Plotly with Cufflinks is pretty great to integrate with pandas (wide range of quick examples here: https://kyso.io/KyleOS/cufflinks-intro)


Matplotlib, due to mental inertia. But nothing I do rises to the level of "visualization," just plotting. ;-) And my plots are rarely seen by anybody but me.


Plotly works well for me - interactive plots are great, and can produce HTML to embed elsewhere.


Veusz, because I wrote it!


Seaborn


As part of our work, we create data dashboards. We use NumPy and Pandas to analyze data, either Flask or Django as a framework, and HighCharts for interactive charts. The JS charting library has a variety of charts to meet our needs. Our input stored either as SQL or CSV is dynamic data.

I am beginning to wonder if HighCharts is replaceable with a fully open-source charting library.


Have you checked out the dash library by plotly?

It's pretty awesome and i use it for a similar purpose. MY dashhboards/charts are pretty basic but there seems to be good support for interactivity and callbacks.

https://dash.plot.ly/gallery

https://plot.ly/python/


I should be pretty easy to set this up with Dash by plotly, and you can host in various places.

Here a guide we've written for an app where you can drag and drop a file and get a chart from it.

https://kyso.io/KyleOS/creating-an-interactive-application-u...

But you can extend to take an SQL query or maybe to set up an external api call.


> Sadly, in Python, we do not have a ggplot2.

https://github.com/sirrice/pygg provides the ggplot2 syntax in Python as a wrapper around Wickham's R implementation. It is useful if you 1) want the R syntax, 2) program in Python, 3) just want static plots.


These python tools are slowly approaching the quality of gnuplot. In a few years they will be almost there.


I'm looking for an interactive visualization tool that I could have site visitors access via a github blog page. Does Altair manage client-side processing? I'm still pretty new in the search but like Altair's adoption of Vega-lite.


Yes, you can save an Altair chart as an HTML page. See https://altair-viz.github.io/user_guide/saving_charts.html#h.... You can also just get the Vega-Lite JSON form Altair and embed it in a web page with Vega-Embed.


I'll dig in -- thanks! Looking for the interactivity -- surely that's not supportable in native HTML, right? Has to use JS or CSS? (I'm not a web-dev of any sort, only very superficial understanding, data scientist by day)

EDIT: Ohhhh, this looks nice!


Here's a guide to using Github+Kyso [1] to publish your type of article to the web, it should be a very similar workflow to github pages, and you can use any of the popular python visualization libraries - we support plotly, bokeh, vega, altair, matplotlib etc.

https://towardsdatascience.com/publish-data-science-articles...

[1] Disclaimer: I'm the CEO of Kyso


`bokeh` is another package that can fit this bill.


Plotly plots saved as html?


How is Altair on 3D data? I see no examples of this. Matplotlib is decent here (apart from all the same disadvantages the author lays out), but the default options look kinda fugly.


Reading the article would help with this question.


In the article it says Altair does not do 3D plotting, if that was your question.


Plotly supports interactive 3d scatters, lines, surfaces, cones, streamtubes, volumes and isosuraces :)

https://plot.ly/python/3d-charts/


The article says it doesn't support 3d.


Expected some interface to an Altair computer and blinking lights.

People don't care anymore how they name their projects. :-(


Altair here is a play on the underlying library called Vega. They're both stars in the "Summer Triangle" https://en.wikipedia.org/wiki/Summer_Triangle


People don't call their programs IBM or Apple.

How about a JavaScript library to extract the second to last character of a string and call in Commodore Amiga 3000?


One question:

Does it work in jupyter notebooks/labs?



Pretty cool.

> Sadly, in Python, we do not have a ggplot2.

I've got to ask, though: why is that the case?


No fundamental reason. ggplot2 was released 4 years after matplotlib, and the Python ecosystem was already centered around the latter by the time it became obvious that the grammar of graphics approach was superior. Python's surging popularity in the data analysis space is also pretty recent.

But any approach to a ggplot2 equivalent either has to abandon the massive ecosystem around matplotlib, or build on top of it – and matplotlib's heavily state-based approach makes that difficult. Plotnine is attempting to do that, I hear it's pretty good.


I'm not sure it is. I'm more of an R guy, but when I do use python I use plotnine[1], which is very ggplot2-like.

[1] https://plotnine.readthedocs.io/en/stable/


Isnt there a ggplot2 like library out there. I remember coming accross one at some point.


There are several libraries inspired by the grammar of graphics and ggplot, and there has even been something like a port (although its now abandoned):

https://github.com/yhat/ggpy

What I think the author means is that there is no ggplot in the sense that there's no One Ring To Rule Them All -- ggplot2 basically killed off lattice and base R graphics for about 90% of users. The Python graphics ecosystem is more Balkanized.


matplotlib & things built upon it e.g. seaborn are the dominate forces in python visualizaiton.


Excellent visualizations!

Could you share the code used to create your last sample chart?


This example gives you the main structure: https://altair-viz.github.io/gallery/ranged_dot_plot.html. Let me know if you need the specific code for that visualization, happy to share it.


Does anyone have a mirror?


It should be working now!


Altair doesn't seem to have quiver plots though. :(


I think you might be able to build one based on this Vega version https://vega.github.io/vega/examples/wind-vectors/


Anyone here tried/like metrics-graphics?


lol, looks like we broke the site :|

Bandwidth Limit Exceeded


Up again!


That post is the only thing on the entire site.


Yes! Just created this blog and this is my first post.


I came in here expecting to see one hell of a rationalization for using a 44 year old computer to do visualizations. :)


Haha I came here expecting to see the dude from Assassin's Creed drawing all kinds of graphs :'-) Or better, imagining.


A literal assassin killing me would be better than working with anything based on matplotlib; fortunately altair does away with it.


Haha, me too. Still interesting, just not nearly as interesting as expected.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: