Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Mito – Write Python/Pandas faster by editing a spreadsheet, in Jupyter (trymito.io)
101 points by aarondia on March 7, 2021 | hide | past | favorite | 29 comments



Hi HN! I'm here with my co-founders (narush + jacobdi) to show you Mito (https://trymito.io/) — a point-and-click data science Jupyter Lab extension that automatically turns your analysis into Python.

We started building Mito 6 months ago after finishing up undergrad engineering + business school (aka Excel School). We became comfortable doing data analysis in visual environments, but were held back by Excel's 1M row limit + the inability to create repeatable processes. Doing analyses in Python was wayyy more powerful, but also required tons of trips to Stack Overflow + pandas documentation.

After a few months of building, pivoting (our vision) + lots of refactoring, Mito now supports writing spreadsheet formulas, pivoting (dataframes), merging, saving + applying macros, and the tiniest bit of graphing. And it generates the equivalent pandas code in real time for all of it :)

You can download Mito by following our always-WIP documentation [0]. We'd love to hear your first impressions of the tool (especially if you download it) + your experiences in/building for the data science/analytics community.

[0] https://docs.trymito.io/getting-started/installing-mito


Are you working on this full time?


We are! We started working on it when school moved online due to Covid, and have been working on it full time since graduation.


Congrats on the launch! Tool seems super useful –– I hate having to memorize/wade through all of Pandas docs/flags to get something done. Especially indexing and grouping...


Thanks -- definitely agree. Grouping/Pivot Tables was one of the first feature we added because the earliest users were showing us Jupyter Notebooks full of pandas pivot tables. But when they had small datasets, they'd always use Excel instead because its so much easier using their interface.


If you like Mito, here are some other, similar tools that you might like:

Open-Source:

- dtale: Very similar to Mito. On top of mito, it provides basic data exploration and can also be launched from within VSCode. Also exports pandas code but not inline into a cell.

- pandasgui: alternative that does not export code, yet.

Not Open-Source:

- bamboolib: very similar to Mito e.g. also has code export into a cell. The basic version is free on local Jupyter Notebook and Lab. bamboolib does NOT allow inline writing of Excel-style formulas like ROUND(A1, 2) like Mito does. On top of Mito, it supports more pandas functions e.g. also datetime handling. Data explorations for the whole table and columns. It has a plot creator for creating Plotly graphs. It does not log any user data - neither about feature usage nor about the actual data. It also works in Jupyter Notebook. Enterprise customers love the ability to extend bamboolib with plugins in order to add their own custom plots or data transformations. Also, bamboolib supports data loaders e.g. to load CSV files from a GUI - Mito currently seems only to work when the data already is available in a Dataframe variable. With bamboolib the user does not have to code anything in order to spawn the UI. The user can just type the name of the dataframe. For Mito the user needs to type mitosheet.sheet(df_name). bamboolib is more mature because it is roughly 2,5 years in development and has many enterprise customers like Spotify, Bain & Company, Procter&Gamble and 2 of the top 10 global asset managers.

Full disclosure: I am a co-founder of bamboolib


When seeing products like this, and people who find it useful, it strikes me how there are apparently many types of 'data users' in Python. This seems the polar opposite of what I would find useful with data. The point-and-click is very much what I'm trying to avoid.

Not questioning the need for this, rather being continually surprised how diverse the Python userspace is, even within a "field", like "data science".


Really interesting perspective - an totally agree, there is a huge diversity within python users along many dimensions!

If I can ask - why do you try and avoid point and click tools?


As I see it, with point-and-click, simple things are simple, medium things are quite hard and complex things are nigh-impossible. With code-based tools, simple things invoke a bit of effort, medium things a little more effort and complex things are only a little harder still.

Maybe it’s my perception, or maybe a function of the kind of tasks I do, dunno, but I spend a fair amount of time to go away from GUIs if only I can


Very, very cool! I love the code-generation bit, and I've been wanting to add something like it to https://github.com/plotly/jupyterlab-chart-editor for a long time! Is this functionality based on something open-source? I've seen https://github.com/mkery/JupyterLab-CodeAnalysisDemo ... is this related?


Please don't use span on links on your web page, my mouse cursor didn't change, and it was very strange.

Maybe I'm too snobbish, but I don't trust a company with my data that doesn't know basic HTML (I would install the local version though).

Also the github links don't work (npmjs and python repositories show that it's using BSD license): https://github.com/mito/mito


Thanks for both of those tips, and totally understand the concerns that come along with it. Will push a fix right now!


Hey HN - Super excited to hear any feedback y'all have on Mito or our docs. Happy to answer any questions here!


Super cool, congrats to the team! I can't count how many times I've had to open the merging section of the pandas docs. Great library but I find a lot of it unintuitive and hard to commit to memory.


The merging docs are only trumped in unintuitivenes by the .iloc docs :)


Awesome product, you should be proud! Sent this to some of my data whiz friends && I know they'll appreciate it. I'm curious - what are the top use cases of Mito amongst your current users?


Congrats on the launch folks - I spend tons of time in spreadsheets and can think of a few use cases for this (handling more regular repetitive tasks and processes). Looks useful


Heyo! Looks super cool and useful. Are you open souce?


Mito is not currently open source, although it is available through PyPI. You can follow our installation instructions https://docs.trymito.io/getting-started/installing-mito to get setup locally.

Or if you don't have Jupyter already set up on your computer, we have a hosted version of Jupyter Lab that you can make an account on.


This is very confusing. The PyPI page says it's supposed to be BSD licensed:

https://pypi.org/project/mitosheet/#files

You can get the source code for something from that page, but it seems you are trying to steer people away from it as much as possible. At least, it doesn't seem to be on github (the horror!).

Is this just because you haven't cleaned up the source code as much as you would like, or does the open-source portion need some proprietary component to work?


Right, even the headers on all the source files claim BSD...


Congrats on the launch. The product looks useful.

However, I installed the package locally, and I see that importing it causes a request to be sent to segment.io.


Good catch. There's another more detailed response I just left to a similar comment, but the TLDR is that we do some logging of metadata of the dataframes passed to the tool, and what features are used.

If you're interested in using the tool without any logging, just let us know and we can make sure to turn it off for you!


How did you make the translation part of it?


There's two types of translations that occur in Mito, writing spreadsheet formulas, and then everything else (merging, pivoting, etc.)

For the spreadsheet formula part, we parse the formula in the Jupyter Python kernel and convert it to Pandas code. We actually found a bit of a trick to make this easier. TLDR: We defined Python functions that have the same name as the spreadsheet formulas, it helps us avoid building writing a formal language grammar + syntax tree. If you're interested in reading about it a bit more, we actually have a short blog post that you can checkout: https://trymito.io/blog/transpiler

The second type of translation (merging, pivoting, etc) sends a message to the backend with the parameters configured from the point and click tool. And then executes the equivalent Pandas code in the Python kernel and writes the code to the Jupyter Cell.


I was really interested in this as an easy way to help people get to grips with the intricacies of Pandas and working with dataframes, but after looking over the source code that's published on Pypi I don't think I can ever recommend this package to anyone due to the degree of tracking that's present in the code.

There's no mention of the Segment tracking in the docs, and I don't see anyway for the user to opt out of it, which I think is an immediate GDPR issue.

Given that you are logging metadata about the dataframes in use along with the user email and name of the logged in user, I can't see this ever being used in an environment where sensitive data is being processed, since it could potentially leak PII that's easily tied to a given company via the email address.

This is a great idea, and I think if you can go with the BSD license and provide a way for people to opt out of tracking (or ideally flip it and allow them to opt in) this could be used in any number of industries. As it stands currently I just don't think this will ever pass a data audit at any large company which is a real shame.


Thanks for that callout. We appreciate the perspective and will work on adding more disclosures of where logging does occur, making it easier for the user to opt out, and review what we are logging and how we can reduce it. I think your analysis is correct, but as a summary: we log the email provided by the user when they first create a mitosheet, the size of the dataframe, the header names of the dataframe, and then the interactions with the UI.

For our current users who have told us that they are not comfortable with logging, we have been able to turn off logging for their specific accounts. So if you're interested in continuing to checkout the tool while we make those improvements, just let us know.


Seems pretty cool. Does this only work in Jupyter?


Currently, Mito only works in Juptyer Lab version 2.0. We don't work in either Jupyter Notebooks, Jupyter Lab 3.0, or Google Collab. However, we'd love to expand to those in the future.

Since we only support Jupyter right now, about half of the early Mito users are using it locally and the other half are using it on a hosted version of Jupyter Lab, which just makes it really easy to get setup without worrying about Python installations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: