
Show HN: Bamboolib – A GUI for Pandas (Python Data Science) - __tobals__
https://bamboolib.com/demo
======
westurner
This looks excellent. The ability to generate the Python code for the pandas
dataframe transformations looks to be more useful than OpenRefine, TBH.

How much work would it be to use Dask (and Dask-ML) as a backend?

I see the OneHotEncoder button. Have you considered integration with
Yellowbrick? They've probably already implemented a few of your near-future
and someday roadmap items involving hyperparameter selection and model
selection and visualization? [https://www.scikit-
yb.org/en/latest/](https://www.scikit-yb.org/en/latest/)

This video shows more of the advanced bamboolib features:
[https://youtu.be/I0a58h1OCcg](https://youtu.be/I0a58h1OCcg)

The live histogram rebinning looks useful. Recently I read about a
'shadowgram' / ~KDE approach with very many possible bin widths translucently
overlaid in one chart. [https://stats.stackexchange.com/questions/68999/how-
to-smear...](https://stats.stackexchange.com/questions/68999/how-to-smear-a-
histogram)

Yellowbrick also has a bin width optimization visualization in
yellowbrick.target.binning.BalancedBinningReference: [https://www.scikit-
yb.org/en/latest/api/target/binning.html](https://www.scikit-
yb.org/en/latest/api/target/binning.html)

Great work.

~~~
kite_and_code
Thank you for your feedback and support :) Are you currently using OpenRefine?

We are currently thinking about providing other dataframe libraries like dask
or pyspark and similar. However, we are a little bit unsure on how to make
sure that there is user demand before we implement it. It is not a complete
rewrite but it would require some additional abstractions at some points in
the library. And we need to check if some features might not be available any
more. Would dask support be a reason to buy for you?

Great hint with yellowbrick and yes, we are considering some of those features
as well if there is a useful place in the library.

In general, we are also thinking about ways how you can extend the library for
yourself so that you can add your own analyses/charts of choice and then they
will come up again the right point in time. In case that this is useful.

~~~
westurner
In the past, I've looked at OpenRefine and Jupyter integration. Once I've
learned to do data transformation with pandas and sklearn with code, I'll
report back to you.

Pandas-profiling has a number of cool descriptive statistics features as well.
[https://github.com/pandas-profiling/pandas-
profiling](https://github.com/pandas-profiling/pandas-profiling)

There's a new IterativeImputer in Scikit-learn 0.22 that it'd be cool to see
visualizations of.
[https://twitter.com/TedPetrou/status/1197150813707108352](https://twitter.com/TedPetrou/status/1197150813707108352)
[https://scikit-learn.org/stable/modules/impute.html](https://scikit-
learn.org/stable/modules/impute.html)

A plugin model would be cool; though configuring the container every time
wouldn't be fun. Some ideas about how we could create a desktop version of
binderhub in order to launch REES-compatible environments on our own
resources:
[https://github.com/westurner/nbhandler/issues/1](https://github.com/westurner/nbhandler/issues/1)

------
sauwan
As a non-data-scientist who does some infrequent data analysis in python, this
looks amazing. But not something I think we can justify paying for with the
amount of analysis I do.

If this doesn't work out commercially, would you consider open-sourcing it?

I am curious how many data scientists don't already have pipelines that do
similar functions?

~~~
kite_and_code
Thank you for your feedback! How often do you perform analyses? And what do
you think is something that you can justify paying? We would like to find a
suitable pricing schema for all use cases.

About open-sourcing: I cannot tell right now what the situation will be in the
future. But I can tell you that we believe in Open-Source and technologies
which dont provide a vendor lockin. This is also why we export the pandas
code. So, you are always flexible with your code and you own the result of
your work.

Basically, we want to strengthen people to use Open-Source software at the
core but we also want to make it as user-friendly as fully proprietary
solutions like Trifacta. So, you will have the best of both worlds without the
vendor lockin.

We talked to many Data Scientists and some already started creating similar
packages etc but they never got far because it takes a long and consistent
effort to catch most of the cases. Also, it quickly becomes a software
engineering challenge.

------
madmaze
Looks great, but not a fan of the licensing model! $600/year for something
thats 99% open source?

[https://bamboolib.8080labs.com/pricing/](https://bamboolib.8080labs.com/pricing/)

~~~
__tobals__
So if I understand you correctly, you deem the price for the annual license
too high? Could you elaborate on what you mean by 99% open source and how that
relates to your perception of the price?

~~~
set92
Because is mainly using Jupyter Notebook, Python and Pandas.

In this times is normal for companies to create their own products using open
source products, but to some people is not very good seen.

In my case I think is not worth it a tool that only works in some specific
environment, that doesn't have many functionalities, and it costs more than
all the products of Jetbrains. I don't like either that is a tool built on top
of open sources projects trying to charge a big amount while it does not have
almost any functionality.

~~~
IanCal
> Because is mainly using Jupyter Notebook, Python and Pandas.

I really don't think that is correct. It integrates into those / builds on
them, but those projects absolutely do not have the features that I can see
playing around with this product.

------
simlan
Looks really interesting. The GUI is purely built on ipywidgets correct ?

~~~
kite_and_code
This is correct! ipywidgets is awesome because it merges the powerful Python
ecosystem with all the capabilities of the web including HTLM, CSS and
Javascript. We are super excited what we might build merging those two worlds

------
amrrs
Previous -
[https://news.ycombinator.com/item?id=20614896](https://news.ycombinator.com/item?id=20614896)

~~~
kite_and_code
Thank you for linking the old post because the last time some people doubted
that we would actually create this :)

~~~
mellosouls
To be fair, last time you posted a Show HN with nothing to show.

~~~
__tobals__
Fair point indeed, but the good news is: we improved on that :) And it won't
happen again.

~~~
mellosouls
Fair enough, good luck!

~~~
kite_and_code
Thank you :) Support is always super important in such an early stage!

------
amrrs
If this is supposed to replace Microsoft Excel - Would an enterprise be
willing to pay $500 / month for this thing instead of getting an entire Office
Suite?

I don't want to demean this amazing work but I think the user persona and the
price for the tier seeming to be a mismatch.

~~~
kite_and_code
Interesting to see that you already think about replacing Excel. We won't go
that far. Currently, bamboolib is intended to save time for Python Data
Scientists and therefore it integrates perfectly into their working
environment. Python Data Scientists cost their companies between 2k to 10k USD
per month. And with bamboolib they should easily save 10h per month.
Especially if they need to explore new data sets or don't know the full pandas
API by heart. Thus, the price of 49$ per month should be a great deal because
we want to provide 10x value per cost.

On top, bamboolib aims to reduce the training time for new Data Scientists.

In addition, bamboolib makes pandas available to people who are proficient
with working on data but not specifically Python or coding. Thus, companies
can let people with business knowledge work on the data transformations who
then hand over the code to Data Engineers who deploy the code, or similar

What do you think about this?

~~~
argument_clinic
Awesome demo, so you deserve some honest feedback.

After the demo I looked at the pricing and immediately decided it's not worth
it by far.

From the viewpoint of a freelance software dev that does quite a lot if data
cleaning lately, the price is so high that I wouldn't even bother trying it on
binder.

As a comparison, I pay €53/year for PyCharm professional that I can install on
as many machines as I like and pay for my Excel/Office a similar yearly
amount. I switch between 3 computers, so having a license nailed to one of
them is a dealbreaker.

Also, $49 + taxes roughly translates to 1 hour of income per month - every
month if I use it or not. Plus I'd have to factor in the time it takes to
setup and deal with license problems & bugs. Setting up licenses behind a
company firewall is quite a challenge - unless you use a simple txt.file
license option like jetbrains. BTW, Jetbrains also has a very cool feature in
the license model: If you pay for at least a year, you get to keep the last
version that's at least one year old for free. From my usage, I estimate that
bamboolib could save me 1 hour per month max - currently I just paste to excel
if I need to scroll in a larger data set or use the .sample() function to look
at some examples.

So to tempt me there should be a freelancer license at a maximum of $49/year
that covers at least 3 machines (only use one at a time) and should work
offline.

BTW, the companies I work for all have not made the jump to Jupyter labs, yet.
They are firmly Excel based and I'm constantly trying to drum up interest for
Jupyter. I also do regular meetup talks on Jupyter (where normal business
people show up) and many of them don't know that it exists, yet.

So having a very cheap or even free personal license would showcase your
program to companies... and you could write the license in a way that
companies need to buy a full price version.

~~~
kite_and_code
Thank you so much for your honest feedback! That is the enabler so that we can
serve you better in the future

Also, thank you for the licensing input - so that we can consider and support
other options in the future.

Why do you think that you will only get 1h per month out of this? How many
hours per month do you spend with pandas? Given this estimate, I can totally
understand your price proposition of 5$/month because we also aim to provide
at least 10x value. However, we assume 10h savings per month. Did you already
see the data visualization features?
[https://www.youtube.com/watch?v=I0a58h1OCcg](https://www.youtube.com/watch?v=I0a58h1OCcg)

Did I understand you correctly, that you propose offering a free version
(because it might not make sense to charge less than 5$ per month anyway?) for
business and another one for 49$/year for freelancers/businesses? Or do you
also propose adding another company license?

------
Pinegulf
This looks user friendly and will certainly have impact on ppl learning
python/pandas.

Yet not for me as I do not like the 'click to create macro code'. They never
have all the things I want and like to have my code in my syntax. But that's
me.

~~~
kite_and_code
Thank you for your feedback! :) We also dont like software where you are
restricted to whats available. Therefore, we integrated so tightly with
pandas, so whenever something is not available you can add the code in the
user interface or just in another cell. What do you think about this hybrid
approach?

~~~
ebg13
> _What do you think about this hybrid approach?_

What I want is to be able to add new functionality to the bamboo _UI_ via new
buttons for specific functionality that you don't have. Maybe if you had a
plugin architecture.

~~~
kite_and_code
That sounds great and is something that is already considered to some extent
in the current software architecture. Can you name one exact feature that you
would like to add? Also, feel free to reach out to us via email. We would be
happy to help you write the first plugin/extension :)

~~~
ebg13
The basic gist of one feature I want to add is a cell delimiter split that
turns one row into more rows (repeating the nonsplit cells) with grouping to
determine whether the splits happen concurrently or sequentially if multiple
of such splits happen in the same row (resulting in a Cartesian product-like
result or not). Right now you only have a split that turns one column into
more columns.

I have code for this already and I'd just want to add it to the UI.

~~~
kite_and_code
Sounds interesting and very similar to a combination of a string split and
then unpivot/melt if I understood you correctly. Please feel free to reach out
to us via email: info AT 8080labs.com and then we can discuss how you can
create an extension if you like :)

------
__tobals__
bamboolib is a GUI for transforming and visualising pandas DataFrame objects
with no to little code. Feedback is appreciated!

~~~
mkl
Your hex plots use squares, not hexagons, so they aren't actually hex plots.

I think you should be much more up front that this is a commercial product,
because very few things based on Jupyter and Pandas are.

~~~
__tobals__
I think we have already fixed the plot naming. Did you see that in the video?

About the communication: I can understand that. We try to communicate clearly
that we both are commercial and support Open Data. It's not easy, however.
Especially when people see you for the first time.

According to your perception, where could we be more clear on the
communication?

~~~
mkl
Yes, hex plots in the video.

I think you should mention it near the top of the demo notebook linked here,
if it's intended to be a main entry point. I looked at the notebook and videos
before heading to bamboolib.com, which has the first mention of pricing, at
which point I felt like I'd wasted my time (because I'd gotten the wrong
idea). I think most people are used to anything demoed in a Jupyter notebook
being open source.

~~~
kite_and_code
Thank you for your input and the suggestion where you would have expected this
info the first time

------
bayesian_horse
In terms of resource utilization I recommend to link to the youtube videos
first and not to a "binder" url that starts up a jupyter container...

~~~
kite_and_code
Totally true, we also thought about this. However, as we understood the terms
of Show HN, the link is supposed to link directly to a live demo? Maybe we
understood this wrongly? Any advice on this would be appreciated..

------
pplonski86
Who is your target user? I think that if user is able to install jupyter
notebook, pandas and load data with python, there is a high chance that user
can also search pandas documentation and write few lines of code.

Anyway, I like the idea of making UI for Pandas, but I think that there should
be more comprehensive software to make data science easier for non-coders.

~~~
__tobals__
Our main target user is a professional python data scientist that wants to be
faster at data wrangling and visualization so that they can focus on
understanding the data instead of coding the same pandas commands over and
over again. That’s why bamboolib has both data transformation and exploration
features included. In the future, we will provide more sophisticated features
from which also more experienced data scientists can profit.

Yet, I definitely agree that bamboolib especially nicely suits pandas learners
and non-coders. Would be happy to have a direct exchange on your ideas on this
topic (feel free to pm me at tobiaskrabel at gmail dot com).

~~~
missosoup
This doesn't accelerate professional data scientists.

To me the only use of this tool is to make available the more complex uses of
pandas to individuals without the background/understanding of how to wield
those.

But without that understanding, giving those people a UI of functions they
have no understanding of is just a recipe for disaster.

All these tools that aim to lower the barrier to entry for data science
without fully automating it are doomed to fail because they have no audience.
The market for analysts who aren't also software engineers is shrinking to 0.

~~~
tastroder
> But without that understanding, giving those people a UI of functions they
> have no understanding of is just a recipe for disaster.

I can see use of the UI in a classroom setting to bridge the learning gap for
people that are pretty proficient with the utility libraries like pandas
offer, but lack the experience with Python and reading documentation at this
point in time. I honestly fail to understand this critique, that sounds like
saying we should ban Excel because people could use it to calculate something
that doesn't make sense. It's not like pandas does something magical and every
half decent Excel user understands the functionality behind the buttons I see
in the bamboo demo linked here.

> The market for analysts who aren't also software engineers is shrinking to
> 0.

While I certainly get where this perspective might be coming from, I find it
unlikely to be true. The recent acquisition of Tableau and growth of similar
no-code tools shows that it's untrue from a business perspective (just read
one of the HN threads on these topics, plenty of non-SE people making good use
of them). Even from the code perspective, outside of production most of the
data analysis code I see hardly shows any signs of good software engineering
practice and yet fulfills the task it is written for.

------
ebg13
I'm sad that it's not open source, because I want to add functionality and now
I can't.

~~~
kite_and_code
What kind of functionality do you want to add? The software is written in a
modular way and it is not too hard to extent it. Please reach out to us via
email and then we will help you to extent the library.

------
ddgflorida
Error loading 8080labs/bamboolib_binder_template/master!

------
RocketSyntax
wow. i love it. (1) it would be nice if the ui panes did not overlap the table
(2) what do you think about automated chart creation as seen in apache
zeppelin and databricks notebooks? (3) shaded cells for 0-100 gradients like
beakerx. (4) has this been tested with any large memory/ distributed memory
pyspark/ koalas libs?

~~~
kite_and_code
Thank you for your excited reaction and your comments :))

Why do you not want the UI panes to overlap? And what exactly do you mean?
Because the pane does not overlap but you can inspect and reach the full
table. Maybe it was confusing on the video?

About visualization: We also provide quite some automated charts as can be
seen here:
[https://www.youtube.com/watch?v=I0a58h1OCcg](https://www.youtube.com/watch?v=I0a58h1OCcg)

We are thinking about supporting other dataframe-like libraries and hopefully
we can support all of them. However, it is a matter of priority here. The
architecture enables this in general but we need to find users who actually
want and need this. Any idea in that regard is appreciated and if enough
interest builds up, we can definitely support this.

What do you mean by (3) with the shaded cells? Maybe you can give an example
here?

------
helloiloveyou
This is certainly amazing! Thanks!

------
mnist91
Looks awesome!

