34 points by StreamBright 72 days ago | hide | past | web | favorite | 45 comments

> As you might know, we will start developing bamboolib as soon as we have 10.000 subscribers!

No, I didn't know. Not a fucking mention of it. Would have been nice to have that on the page somewhere.

Bamboolib - an advert for a proposed GUI for Pandas

I was going to post and ask about an estimated release date, little did I know it would be an artificially imposed one. I was sign up number 900 something. What happens if you guys never hit 10k signups? Is it just never going to be released? The github repo has 200 stars but it’s just a few stub files... this looks like a marketing campaign for a fantasy product and now I’m wishing that I’d used a throw away email. Am I missing something? Is the code for this actually available somewhere?

We are afraid that our effort on bamboolib will be wasted if there is no real interest in bamboolib, which is why we set ourselves the ambitious goal of 10k subscribers. We only want to build software that people REALLY want. Also, we developed edaviz.com and thus have opportunity costs.

The 10k subscribers is a soft goal, so we may still develop bamboolib if we are somewhat close to it, but the 10k give us a clear threshold. Also, if we decide not to do it, we will delete all mails. So you don't need to be afraid that your mail gets abused.

Also please note that our landing page is just online for a couple of hours and we have already hundreds of signups, so in my estimation, getting at least close to 10k is very feasible. This depends of course on whether people who really need bamboolib share it with others who also really need it. And we are glad that we already found some people who would save between 20-40h per month. However, the big question is: are we able to find more people like those and are they able to find us. So, this is the challenge :) However, the challenge is big and therefore, we need support from others who are willing to share bamboolib if they really need it

> We are afraid that our effort on bamboolib will be wasted if there is no real interest in bamboolib, which is why we set ourselves the ambitious goal of 10k subscribers. We only want to build software that people REALLY want.

How do you know if they really want it if all you have to show are mock ups and no real functionality. Or even worse, using videos of OTHER SOFTWARE without mentioning that it's lifted from that and not from a WIP or anything like that.

This pisses me off way more than it should. The onus is ON YOU to do market research, not for the market to do research for you. You should know if this is in demand already or not. What's the point besides mining e-mail addresses?

I want to assure you that we did not do this to mine email addresses and I am sad to hear that you seem to have had negative experiences with this in the past.

Also, to many people it has been very clear that this is just a product vision when they watched the video because I mention this multiple times in the video.

I can understand that you would prefer the "hacker way" of first coding something before talking about it at all. Actually, this is what we did with edaviz.com During that work we came to the hypothesis that bamboolib might be even more interesting to many Python Data Scientists. However, we wanted to save our coding time because we might be slower than you. And thus, we decided for creating the vision video first.

The landing page and the email addresses helped us to get in contact with users who are really passionate about the project and want to see it happen. However, the features within Jupyter are to some extent different than the features within Trifacta Wrangler which operates in the cloud. Also, the users of Trifacta Wrangler don't have the option to fall back to real code when they might want to.

I am sorry if the confusion upset you. We haven't submitted this post and communicated via other channels (e.g. linkedin) where we mentioned our goal of 10k subscribers. Via which channel did you join the mailing list?

The website is just a landing page with a link to the demo video (here): https://m.youtube.com/watch?v=yM-j5bY6cHw

It took me a while to figure out that Pandas is a Python Data Analysis Library [0].

[0] https://pandas.pydata.org/

I wonder if you've come across Monarch?


It's a well-known commercial package that's been around for ages and when I saw it they seem to have some good ideas around UI for data prep (e.g. like seeing your dataframe at every stage, reordering operations and flipping back and forth) that could further inspire your GUI.

Thank you for pointing this out. So far, we did not know about Monarch

Thank you for sharing bamboolib, StreamBright!

I am Florian, the co-creator of bamboolib and I am happy to answer any questions :)

Does this work or is this just marketing for something that doesn't exist?

In the youtube video, it is mentioned that this is a product vision based on Trifacta Wrangler in order to show what bamboolib is aiming for.

Is 'product vision' your words to rationalize marketing something that is completely made up and hasn't been started yet? Maybe you should save your lying HN submissions and market something when it exists.

Very interesting project indeed! Thanks for the effort put into this, I know many data scientist who really like it.

Where exactly are they getting a hold of bamboolib, there is nothing in the github, and as far as anyone has seen, there hasn't been any development done?

Great, what is their background? And why exactly do they like it?

Mostly academia. Because of the ease of use, I think. If you would like to I could pass a questionnaire to them, asking what feature would they see the most.

That would be great! If you need any help or support, please let us know :)

So Trifacta on top of Pandas. OK.

Yep. Some similarity to TFDV too, but the UI here looks to be more or less lifted directly from Trifacta/Cloud Dataprep.

pro: - Trifacta can be slow, and part of that might be the way it stores the data (I'm assuming js data structures); if so Pandas/Bamboolib could improve that.

con: - Trifacta/Cloud Dataprep is directly integrated with Cloud Dataflow and can handle jobs that would crash Pandas.

Thank you for pointing out TFDV (Tensorflow Data Validation) - I had not seen it so far.

And yes, as I say in the video, we used the Trifacta Wrangler Free Version to illustrate the vision of what we aspire to build. In the end, it will look different of course and we have some ideas on where we would imagine a completely different user interface. If this will be better or worse remains to be seen..

And thank you for the comparison of Trifacta and pandas. And I agree, that pandas won't be able to handle any dataset size. However, I wonder if the data set size can be increased if we also work in the cloud on machines with a larger RAM. Or, maybe even export Dask code instead of pandas code.

So, you seem to have experience working with Trifacta Wrangler. Is there something that you don't love about their solution?

It's slow, first and foremost; while I'm not 100% sure on the internals, I think that's because it's doing these operations on js data structures in browser, so pandas would be up to a few orders of magnitude faster out of the box.

Good to know, thank you for this!

Really like the idea and have been bouncing a similar idea in my head for a while.

I think there's great value in making sure the product is a bit more intuitive to unlock some interesting markets, beyond data science.

Would be great talking to you if you are interested :)

Sure drop me an email > valentin at onload dot ie

Another project that could be used for inspiration here and worth checking is http://openrefine.org/

A GUI for viewing dataframes is not too bad an addition, though pandas is a very scope-creepy project. Soon it will be able to send email.

Wait, is this for actual Pandas?

Those guys seem to be working on another lib called edaviz. Also sounds very interesting.

This is correct. Our work on edaviz led us to think about bamboolib. What exactly about edaviz do you find interesting?

Is this free?

Yes, there will be a free version. In addition, we are thinking on how to add suitable premium services in order to fund and extend development because we have many ideas on how to improve the Python Data Science experience :) What are your thoughts on this?

Maybe you should stop lying to people first.

Will I be able to export the pandas code?

Yes, you can first transform the dataframe via the GUI and afterwards you can export the resulting pandas code. So you can reproduce all the results

"... if and when we actually write it."

I think this is a very interesting direction for time consuming data prep work. Love the idea of combining GUI elements for speed and code for flexibility.

Thank you :) how many hours did you spend working with pandas last week?

