Hi HN! I'm here with my co-founders (narush + jacobdi) to show you Mito (https://trymito.io/) — a point-and-click data science Jupyter Lab extension that automatically turns your analysis into Python.
We started building Mito 6 months ago after finishing up undergrad engineering + business school (aka Excel School). We became comfortable doing data analysis in visual environments, but were held back by Excel's 1M row limit + the inability to create repeatable processes. Doing analyses in Python was wayyy more powerful, but also required tons of trips to Stack Overflow + pandas documentation.
After a few months of building, pivoting (our vision) + lots of refactoring, Mito now supports writing spreadsheet formulas, pivoting (dataframes), merging, saving + applying macros, and the tiniest bit of graphing. And it generates the equivalent pandas code in real time for all of it :)
You can download Mito by following our always-WIP documentation [0]. We'd love to hear your first impressions of the tool (especially if you download it) + your experiences in/building for the data science/analytics community.
Congrats on the launch! Tool seems super useful –– I hate having to memorize/wade through all of Pandas docs/flags to get something done. Especially indexing and grouping...
Thanks -- definitely agree. Grouping/Pivot Tables was one of the first feature we added because the earliest users were showing us Jupyter Notebooks full of pandas pivot tables. But when they had small datasets, they'd always use Excel instead because its so much easier using their interface.
If you like Mito, here are some other, similar tools that you might like:
Open-Source:
- dtale: Very similar to Mito. On top of mito, it provides basic data exploration and can also be launched from within VSCode. Also exports pandas code but not inline into a cell.
- pandasgui: alternative that does not export code, yet.
Not Open-Source:
- bamboolib: very similar to Mito e.g. also has code export into a cell. The basic version is free on local Jupyter Notebook and Lab. bamboolib does NOT allow inline writing of Excel-style formulas like ROUND(A1, 2) like Mito does. On top of Mito, it supports more pandas functions e.g. also datetime handling. Data explorations for the whole table and columns. It has a plot creator for creating Plotly graphs. It does not log any user data - neither about feature usage nor about the actual data. It also works in Jupyter Notebook. Enterprise customers love the ability to extend bamboolib with plugins in order to add their own custom plots or data transformations. Also, bamboolib supports data loaders e.g. to load CSV files from a GUI - Mito currently seems only to work when the data already is available in a Dataframe variable. With bamboolib the user does not have to code anything in order to spawn the UI. The user can just type the name of the dataframe. For Mito the user needs to type mitosheet.sheet(df_name). bamboolib is more mature because it is roughly 2,5 years in development and has many enterprise customers like Spotify, Bain & Company, Procter&Gamble and 2 of the top 10 global asset managers.
When seeing products like this, and people who find it useful, it strikes me how there are apparently many types of 'data users' in Python. This seems the polar opposite of what I would find useful with data. The point-and-click is very much what I'm trying to avoid.
Not questioning the need for this, rather being continually surprised how diverse the Python userspace is, even within a "field", like "data science".
As I see it, with point-and-click, simple things are simple, medium things are quite hard and complex things are nigh-impossible. With code-based tools, simple things invoke a bit of effort, medium things a little more effort and complex things are only a little harder still.
Maybe it’s my perception, or
maybe a function of the kind of tasks I do, dunno, but I spend a fair amount of time to go away from GUIs if only I can
Super cool, congrats to the team! I can't count how many times I've had to open the merging section of the pandas docs. Great library but I find a lot of it unintuitive and hard to commit to memory.
Awesome product, you should be proud! Sent this to some of my data whiz friends && I know they'll appreciate it. I'm curious - what are the top use cases of Mito amongst your current users?
Congrats on the launch folks - I spend tons of time in spreadsheets and can think of a few use cases for this (handling more regular repetitive tasks and processes). Looks useful
You can get the source code for something from that page, but it seems you are trying to steer people away from it as much as possible. At least, it doesn't seem to be on github (the horror!).
Is this just because you haven't cleaned up the source code as much as you would like, or does the open-source portion need some proprietary component to work?
Good catch. There's another more detailed response I just left to a similar comment, but the TLDR is that we do some logging of metadata of the dataframes passed to the tool, and what features are used.
If you're interested in using the tool without any logging, just let us know and we can make sure to turn it off for you!
There's two types of translations that occur in Mito, writing spreadsheet formulas, and then everything else (merging, pivoting, etc.)
For the spreadsheet formula part, we parse the formula in the Jupyter Python kernel and convert it to Pandas code. We actually found a bit of a trick to make this easier. TLDR: We defined Python functions that have the same name as the spreadsheet formulas, it helps us avoid building writing a formal language grammar + syntax tree. If you're interested in reading about it a bit more, we actually have a short blog post that you can checkout: https://trymito.io/blog/transpiler
The second type of translation (merging, pivoting, etc) sends a message to the backend with the parameters configured from the point and click tool. And then executes the equivalent Pandas code in the Python kernel and writes the code to the Jupyter Cell.
I was really interested in this as an easy way to help people get to grips with the intricacies of Pandas and working with dataframes, but after looking over the source code that's published on Pypi I don't think I can ever recommend this package to anyone due to the degree of tracking that's present in the code.
There's no mention of the Segment tracking in the docs, and I don't see anyway for the user to opt out of it, which I think is an immediate GDPR issue.
Given that you are logging metadata about the dataframes in use along with the user email and name of the logged in user, I can't see this ever being used in an environment where sensitive data is being processed, since it could potentially leak PII that's easily tied to a given company via the email address.
This is a great idea, and I think if you can go with the BSD license and provide a way for people to opt out of tracking (or ideally flip it and allow them to opt in) this could be used in any number of industries. As it stands currently I just don't think this will ever pass a data audit at any large company which is a real shame.
Thanks for that callout. We appreciate the perspective and will work on adding more disclosures of where logging does occur, making it easier for the user to opt out, and review what we are logging and how we can reduce it. I think your analysis is correct, but as a summary: we log the email provided by the user when they first create a mitosheet, the size of the dataframe, the header names of the dataframe, and then the interactions with the UI.
For our current users who have told us that they are not comfortable with logging, we have been able to turn off logging for their specific accounts. So if you're interested in continuing to checkout the tool while we make those improvements, just let us know.
Currently, Mito only works in Juptyer Lab version 2.0. We don't work in either Jupyter Notebooks, Jupyter Lab 3.0, or Google Collab. However, we'd love to expand to those in the future.
Since we only support Jupyter right now, about half of the early Mito users are using it locally and the other half are using it on a hosted version of Jupyter Lab, which just makes it really easy to get setup without worrying about Python installations.
We started building Mito 6 months ago after finishing up undergrad engineering + business school (aka Excel School). We became comfortable doing data analysis in visual environments, but were held back by Excel's 1M row limit + the inability to create repeatable processes. Doing analyses in Python was wayyy more powerful, but also required tons of trips to Stack Overflow + pandas documentation.
After a few months of building, pivoting (our vision) + lots of refactoring, Mito now supports writing spreadsheet formulas, pivoting (dataframes), merging, saving + applying macros, and the tiniest bit of graphing. And it generates the equivalent pandas code in real time for all of it :)
You can download Mito by following our always-WIP documentation [0]. We'd love to hear your first impressions of the tool (especially if you download it) + your experiences in/building for the data science/analytics community.
[0] https://docs.trymito.io/getting-started/installing-mito