plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.
The interactivity and cross-filtering possibilities is really good when you have many datapoints.
From vega-lite I miss that they did not develop object constancy (https://bost.ocks.org/mike/constancy/) animation possibilities since they're building on top of d3.js.
Also, Altair plots can certainly be used outside of notebooks! Happy to share some examples..
• Not allowed to set figure size
• No wrapping of tick labels
• No strings for pandas aggregation functions
• No automatic ordering of x/y labels (dexplot provides several options)
• Having to use separate grid functions (catplot, lmplot) for multiple subplots
• Something like 5 different functions for scatterplots. Dexplot has one
• No relative frequency bar charts, which are a fantastic way to explore data. Dexplot provides normalization over any set of variables
• No stacked bar charts
• Seaborn docs have distribution plots (box, violin) in the "categorical" section. A major distinction needs to be made between plots that aggregate, those that show distributions, and those that plot raw data (like scatterplots)
• Returning of matplotlib axes or seaborn grid objects. Dexplot always returns the matplotlib figure
• Seaborn is essentially dead as far as I can tell with few changes in the last 2-3 years. There are even parameters that continue to be non-functional
In the future, Dexplot will add:
• Many more plotting functions
• Several apps (built from ipywidgets) to explore data. Currently, there is one for viewing colors
• Better automatic figure sizing (it exists now, but will be improved)
• Automatic DPI detection so that matplotlib inches correspond to actual screen inches
Dexplot aims to be very intuitive, easy to use, consistent, and allow easy exploration (the name is a smashing together of data exploration plotting).
Here is one example comparison between dexplot and seaborn. https://twitter.com/TedPetrou/status/1271436948721328129
Examples such as these are what drove me to create the library.
I'd love to get feedback and happy to take detailed criticism.
Seaborn has received a couple updates this year. Not sure what you mean by can't control figure size either. The ways to do so are inconsistent, but they're there.
> I'd love to get feedback and happy to take detailed criticism.
I like your syntax a lot. This page isn't a good way to show it. Seaborn's gallery page is excellent, even if redundant at times. I would dedicate more time to creating more easily useable docs like that. Docs are almost everything when it comes to charting.
Also need to see stuff on how to control aesthetic things like color, outlines, style, etc.
You cannot control axes plot figure size from seaborn directly. You have to access the figure from the axes (which most people don't know how to do) or create the figure first by importing matplotlib. Really annoying for those that just want to analyze data quickly. Grid plots have the ability to adjust figure size, but return a seaborn object and not a matplotlib figure.
Agreed, docs need to get better. Better datasets, a gallery, etc... I've only spent a week on this, so there will be a lot of improvements in the future.
Still, I'd like to ask if you considered alternatives to MPL for the back-end. It's a venerable but ancient project with years of accumulated technical debt, and I'm sure you had to deal with lots of inconsistencies there.
For example, PyQtGraph is an alternative with a clear class hierarchy and can handle large-scale datasets without slowing down (while anything non-trivial in MPL has you wait seconds to render).
(I'd love to hear more suggestions that don't require a JS engine and don't build on MPL.)
I'm definitely open to looking at alternative backends in the future and will check out PyQtGraph, but am sticking to matplotlib for now.
Thanks for the effort. Looks like a great project.
There should be a library to do exploratory data analysis quickly, without having to touch matplotlib, numpy, or pandas, and without installing something like pandas-profiling to make reports.
This is where the apps will come in to allow users to quickly generate reports on things like missing values, duplicate rows/columns, outliers/bad data, view different colors, etc...
It helped me settle on https://altair-viz.github.io/ (coming from matplotlib) and I never looked back
For instance, if you take this (sample) report using Altair to plot Default Alive / Default Dead: https://datapane.com/leo/reports/startup_finance_report/ - the interactive code is actually relatively small: https://gist.github.com/lanthias/5a41c1e4b21ae274ddb95cf5ad1...
It's also great being able to add Altair shapes to Folium for geoplotting (as the vega geoplotting is a bit more low-level).
What I really think is missing in the ecosystem is a "vega for tables", so you could be rich, interactive tables with a similar grammar. That would rock.
In my business, we have a lot of test data on a database, where everyone uses their own python-based solutions for plotting, mostly done in a Jupyter notebook.
Guess what, to compare results and have one dedicated style-guide for the project, you create more complexity than needed.
-> I tried Orange3 for a while, which has a really intuitive way to use, but I miss the direct connection to a DB. Any advice warmly welcome :-)
• No relative frequency bar charts, which are a fantastic way to explore data. Dexplot provides normalization over any set of variable
• Seaborn docs have distribution plots (box, violin) in the "categorical" section. A major distinction needs to be made between plots that aggregate, show distributions, and those that plot raw data (like scatterplots)
Excellent documentation, but as others have said docs are everything with charting, so expanding that and adding a gallery are probably the best ways to get people onboard. Looking forward to giving this a try.
I've been using it for years and by far it's been the easiest plotting library (while still being really flexible).
The company is very active in developing, for example recently adding plotly_express, which lets me get charts with one liners like: px.line(df, x='x_column', y='y_column')
I'm not affiliated with Plotly but just curious what people think since I find it to be an awesome library but I rarely hear about it or meet people who use it.
edit: but due to this post I checked again and it seems this is no longer an issue. So +1 to plotly.
The dexplo package (a dataframe libray) also sounds really interesting. They also have a barchart race, which looks like an excellent way to get to the front page of Reddit's r/dataisbeautiful
I'm curious, how does this compare to Chartify ? Note, I am not affiliated with Spotify or the Chartify team.
- overlap of label
- HTML tables overlapping the right many bar
You can also publish this to the web (https://datapane.com/leo/reports/stock_report_5d2925b9/), embed it into social media (https://medium.com/@leo_26134/embedding-with-datapane-366e60...), or deploy your Jupyter Notebook or Python script so that other people can run them with parameters to generate reports dynamically.
It's still pretty new, so if there's anything you'd like to see, give me a shout at leo [at] datapane.com.
> A matplotlib figure object is returned [from each plot function]