Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Mito (YC S20) – Edit a spreadsheet, generate Python
140 points by narush on Sept 5, 2022 | hide | past | favorite | 27 comments
Hiya HN, I'm Nate, cofounder of Mito (https://trymito.io) with my best friends Jake and Aaron. Mito is a spreadsheet UI that runs inside a Jupyter Notebook. Each time you edit the spreadsheet, it generates Python code for that edit. This allows analysts to write Python scripts using an interface they are familiar with, instead of waiting months for eng resources.

Mito is open core: http://github.com/mito-ds/monorepo. Our docs are at http://docs.trymito.io, and you can download it here: https://docs.trymito.io/getting-started/installing-mito.

Most people doing data analysis in Python struggle to just write basic Python. If you search StackOverflow for the [pandas] tag, you’ll find pandas users wrestling with everything from “how can I make a pivot table?” to “how do I import from another folder?” These users are experts in their field — they just aren’t experts in Python. Tasks that take them seconds in spreadsheets can end up taking them days. (Here’s how we put it to investors: the next 10 million Python programmers are transitioning from Excel and have one real problem: writing the damn code.) A lot of organizations are stuck on this dilemma: they want to move from spreadsheets to Python, but getting started with programming—even with a highly usable language like Python—is hard.

We’ve spent years with users trying to adapt their spreadsheet skills to Python. It takes weeks to learn the basics. Their existing skills don't transfer. Many of their needs are simple to do in a spreadsheet—writing a formula, aggregating data, graphing—but adapting them to Python requires long courses, emails to internal support (if any exists) waiting days for a reply, and countless trips to Stack Overflow. Often they just give up and return to Excel, but that makes them dependent on IT to write code for them. One of our users was quoted a full year for IT to implement a simple report! (Fast-forward: he ended up using Mito to automate it himself in less than a week.)

We went through this ourselves when we went to college together, studying engineering and business. We first learned data science with spreadsheets, then had to relearn it in Python. The transition was painful—basic Excel was much easier! Of course, not-so-basic Excel soon becomes not-so-easy, which is what drives the move to Python in the first place.

With our interest in spreadsheets, we started a spreadsheet-version-control company at the end of college, and spent a year working with Excel power users. Eventually, we realized that version control was secondary to the real problems users faced with spreadsheets: limited data size, speed limits, lack of advanced functionality, and a horrible replayability story.

Essentially, enterprises are caught between a rock (their spreadsheet woes) and a hard place (the pain of moving analysts to Python). We decided to work on this instead, and started Mito.

Mito is a spreadsheet UI built as an extension to Jupyter Notebooks / JupyterLab. Using a Mito spreadsheet, users can import data, add and delete columns, write formulas like Excel, make pivot tables, generate graphs, and more. See our docs (http://docs.trymito.io) for all our functionality.

Every tab in a Mito spreadsheet is a different pandas DataFrame. For each edit made, a line of pandas code is generated in a code cell directly below the spreadsheet that corresponds to this edit. For example, if I use Mito to import a CSV, add a column named Day of Week, and use the WEEKDAY formula from Excel to pull out the weekday from another column, Mito generates the following code:

  # Imported tesla stock.csv
  import pandas as pd
  tesla_stock = pd.read_csv(r'tesla stock.csv')
  
  # Added column Day of week
  tesla_stock.insert(1, 'Day of week', WEEKDAY(tesla_stock['Date']))
In practice, the typical user bounces back and forth between writing Python and using the Mito spreadsheet, depending on the task at hand. We think this fluid movement between a spreadsheet and Python is really cool. The spreadsheet backend is just a Python extension to the IPython kernel you’re already running for your Jupyter Notebook. Because Mito is just a Python package, all data processing happens locally.

As mentioned, Mito is an open core product. 90% of the code is AGPL licensed. The rest is under a separate enterprise license. These modules are still source-visible, but require users to pay for a pro or enterprise offering before using them. That’s basically our business model.

We have 3 versions (https://trymito.io/plans): (1) Free: basic analysis tools, as well as some basic telemetry that you can opt out of; (2). Pro: all of (1), with advanced functionality; (3) Enterprise: all of (2), with more advanced features, optimizations, and support.

Because spreadsheets are sprawling pieces of software, we’re pretty obsessed with optimizing for long-term development. We use strong types where we can (TypeScript on the frontend, fairly comprehensive MyPy in Python). We’ve implemented our own component libraries for common components from scratch, which lets us be flexible during large refactorings. We implemented our own custom JavaScript grid—hyper-optimized for our use case, and as a result is the fastest JS grid we tested in our context. We're also big fans of metaprogramming—we write an increasing amount of code that writes code for us—which in turn makes it easy to add more functionality to our spreadsheet.

We posted about Mito a long time ago: https://news.ycombinator.com/item?id=24305615. No one really liked it (we learned our lesson!), and it didn't do much at the time — I think the app had a single button that added a column. Three months ago, someone (not sure who — thank you, alefnula!) posted it again: https://news.ycombinator.com/item?id=31446236. It reached the top 3 and we got lots of comments—yay! Since then, we’ve doubled the number of features (mostly data processing), done a UI overhaul, dramatically expanded the Pro + Enterprise offering, made telemetry optional in the free version, and more.

We’d love to hear all about your experiences with spreadsheet analysis, the uncanny valley between spreadsheets and code, the travails of moving enterprise analytics off of spreadsheets, and whatever else you’d like to ask or mention. Any and all feedback is greatly appreciated!




Wouldn't actually using Excel create less friction for potential users?

Your target audience is theoretically Excel users who want/need to code instead, but I think you're alienating the power users of Excel, because their power tools are unavailable in the Mito spreadsheet editor.

For example, have you considered dumping the dataframes to "smart" xlsx files with backing code that connects to a local server, listens to worksheet events and tells the server everything that happens so it can write python code in the source notebook?


We've thought a lot about this one. It's a good idea for usability - agree with you there - but there are some development complexities that make it hard for other reasons.

We spent a considerable amount of time two years ago developing Excel extensions for our spreadsheet-version-control product. It was... not ideal from a development perspective.

The benefits of being in Excel (it has all the features!) is also the cost of being in Excel (you have to support all the features!). This means v1 of the extension you describe with either have to be non-functional on most of these power tools you mention, or we'd need to spent years building in stealth mode before launching something fully working (and I'm not even sure we ever could get there... Excel is... literally so big).

Also, the actual extension points for Excel are not as fully-featured as you might think! In practice, we'd likely have to gate much of Excel's functionality to get an extension that actually works -- there are some hard limits to what you can extend, further making it really hard to actually support these power tools in practice.

Also, for the sake of our users, we love being in a Python development environment! In practice, many of our users move really fluidly back and forth between writing Python and editing a Mito spreadsheet. Effectively - bring a spreadsheet in for what it's good at, when you want it.

We'll keep considering this one, though -- I have a _feeling_ Microsoft might make some Python moves in Excel the next few years... :)


> We spent a considerable amount of time two years ago developing Excel extensions for our spreadsheet-version-control product. It was... not ideal from a development perspective.

So here’s the thing. You develop once, people use many.

So the point of dev is to do the not ideal things so the many users don’t have to. Suck it up so users have it easy. Software that doesn’t make users change, that doesn’t get in the way of a career of learning to bend Excel to their will, they’ll throw money at you for that.


I could have been more clear with my language -- "not ideal" doesn't mean that it was unpleasant (actually not awful hot-reload loop).

It was more-so that it's technical infeasible, given with Excel exposes to you as an extension developer. Aka, the gates the add to their ecosystem make it pretty much impossible, which is another one of the reasons we're open source!

Will clarify my language around this going forward!


Excel workflows are terrible though. No version control, hard to test, prone to indexing errors. And doing very sophisticated things with it gets hard; lots of financial analysts/quants are moving over to Python for analysis anyway.

If you’re thinking about this in isolation, I can see why it would seem a bad idea to move power-excel users to Python. But take this in the context of a much wider shift where many shops are already shifting to Python for other reasons, and so we need a way to help transition the Excel power users over too.

Excel has its place for sure, but I think it’s interesting to consider whether another tool paradigm could gradually replace it; we would need to really hone the flexibility and expressivity of the UI for simple tasks. The benefit would be that when your task grows you don’t need to re-implement it in a new Python engine.


Aaron here, one of the Mito co-founders.

+1, beyond the most obvious reasons that companies are moving away from Excel (too much data to process, not enough robust automation features), there are important workflow management reasons that companies are making the transition.

More and more, we're hearing that companies want to use software engineering practices on their data analytics workflows -- things like version control, easily understanding what edits are applied by looking at the code, and even things like CI to automatically build dashboards from the most up to date data.

While you technically could build tooling around Excel to do a lot of these things, its much easier and already exists in the Python ecosystem.


That's an interesting product. What is the advantage over Power Query for a non technical user?

There is a slightly different idea on the same theme that I'll give a stab at one day: let users express their logic in excel using standard excel formulas, then tell a tool what are the inputs and output ranges, the tool will extract the logic between them (follow the formula), and generate the equivalent code. This would allow a user to express and maintain, in excel, a logic that can be run by IT with no dependency to excel.


For some non-technical users, Power Query is a better option. If your main purpose is to work with a large data set and then update a PowerBI dashboard, for example, then Power Query sounds like a perfect solution.

But what we see is that there are a bunch of reasons that these non-technical users are excited about Python specifically. Here are two examples:

1. One of the first adopters of Mito, let's call her Shelly, is helping a team of engineers build out a Salesforce dashboard to predict when customers are going to refill an order. Since the engineers don't have the business context to figure out how to make that prediction, its Shelly's job to construct the (in this case pretty simple) algorithm by querying the relevant database, figuring out which fields are accurate (there might be 5 different date fields and figuring out which one is actually when the user last placed an order isn't as easy as it sounds), and then making the prediction for each customer. Shelly then uses the Python code that Mito generates as a communication tool for the engineers. The code is an exact audit log of each transformation she made to her data in order to create the report.

2. Many of the companies that we work with have business specific metrics that they calculate, so they have an engineering team build a Python package that can easily calculate those metrics. Sometimes they will even provide boiler plate Python code snippets to interact with those packages. (In the future they'll be able to import them into Mito!). Its a win for the employees who can rely on the code snippets instead of calculating the metrics manually, and its a win for the engineers who can write Python code instead of M.

The last thing I'll say is that companies are moving to Python because of the openeness and robustness of the Python ecosystem. They are power users of packages like Voila, Plotly. Having employees work in Python opens up a ton of doors for how companies can support them.

Your idea about expressing logic in excel and generating the equivalent code has merit too for a different user base. Let us know when you build it, excited to check it out!


This is quite similar in concept to a spreadsheet product from 2008, called Resolver One, which ran on IronPython.

https://media.prleap.com/image/221/640/share_trade_screensho...

It was excellent, and a bit of a shame that it didn't get more traction at the time.


Very cool, thanks for linking! I’ve never heard of this before - do you know if it generated Python or rather allowed you to edit the spreadsheet with Python?

One thing we’d like to add soon is Python spreadsheet formulas. Often requested - I think it’d be super sweet as well!


It generated the spreadsheet as Python code, as you entered the formulae. Then You could add functions of your own design and call those from the cells too.


Very cool! This second part (running Python in the sheet) is coming soon - both allowing you to call custom Python functions in the sheet - but also allowing Enterprises to extend the spreadsheet interface as well!


This is cool.

I'm a Python engineer (and ML person). I don't use pandas often, so when I do need it, I am constantly on stackoverflow.com and testing single lines at a time in Jupyter.

I'd love a version of Mito where I could give it the original mock table and the desired output table maybe as a function (not using a spreadsheet UI), and it would propose pandas code for me.


This sounds sweet. Pretty much like Excel's auto-fill functionality, but instead it works on entire tables (and generates Pandas code).

I wonder if there are good heuristic-based approaches for determining this. We could always to some ML code-gen, but I prefer deterministic approaches for their reliability (and how it's easy to grok what/why failure occurred) -- at least for now!


This is really interesting. I would almost frame it as “the missing GUI for pandas”.


I like that framing a lot! Aligns with some other comments about focusing on the positives, certainly!


Well done on making such a huge application!

From a user perspective, what are the benefits using this rather than using PowerQuery within Excel? From a functional perspective it seems to do something very similar (i.e. your demo on your front site, I could just do in PowerQuery).


It's a good question why our users prefer Mito+Python over something like PowerQuery+M! One might similarly ask what's wrong with Excel+VBA - although I'll note I haven't heard anyone champion VBA recently... :)

In practice, most of our users are have started with Python by the time they use Mito. For now, we're not positioning ourselves as an alternative to PowerQuery, but rather a tool for someone who is coming from spreadsheets, has chosen Python, and is struggling to write code.

The next obvious question is why our users are choosing Python in the first place -- what I'll say here is that like any programming language, there are a huge number of reasons: some of our users prefer Python because that's what their colleagues work; some choose Python because they think it's trendy/cool; others choose python because that's where the libraries they want to use are; others are starting down the path of getting into ML (which is primarily in Python); others want to integrate with existing Python infrastructure within their company. We've also seen massive enterprises with top down edicts to move to Python "within the next 5 years", as well.

In practice, Python is the most popular general purpose programming language for data science - and so we're doing our best to meet our users where they are: writing Python code, in Jupyter Notebooks!


TBH I think your target market is quite confusing.

It seems to be a non-technical user who is struggling to write Python and wants an easy way out, but is willing to install a tool via a CLI within a python virtual environment, knows what a Jupyter Notebook is and possibly wants to start writing machine learning code?

If the target market is actually the 'struggling non-technical user' I suspect you will need to remove as much friction as possible, although i'm not entirely sure if that is your target market.

IMO would be good to focus on how your product actually helps do analysis better than Excel + PowerQuery/M, because presumably there has to be some sort of functional benefit otherwise what's the point?


I think your description is a pretty accurate description of most of our users: they are struggling to write Python in a Jupyter Notebook, and can install some basic packages (albeit it with some struggles -- see our Discord install help channel). The ML code part, you're right, def more rare :)

Python code helps these users do a variety of tasks that aren't possible in other analytics tools like PowerQuery/M. Many of these tasks are specific to the company/existing infrastructures, as I mentioned above.

A super concrete example: the head of data strategy at a life-sciences company made the transition to Python primarily because the rest of his (2 person) team uses Python. They primarily communicate about new datasets using Mito generated code (e.g. here are the steps to clean this data) - but he's not great at Python - so in practice he uses Mito for 9/10 analyses he does to generate this code he sends to his colleagues!

Can give a few more if you'd like -- let me know!


Hope you are managing to sell lots and your product is a success :)

If not, it might be worth positioning your product as helping people to do analysis better / faster / more accurately and 'turbo charging' analysts rather than selling it as a tool for analysts who are out of their depth (which is a more negative target).


Super fair and honestly great feedback. I think the phrasing you say is probably much more appealing to users when they think about what they want/need!


The friction of getting started with Mito is something we spend a lot of time focusing on. For example, when it comes to the installation process, not only do users install Mito through a CLI, but because JupyterLab 2, JupyterLab 3, and Jupyter notebooks all support extensions in different ways, there are different installation commands that users need to run to get it working for their specific environment. Initially, we just gave users instructions in our docs about which commands to install for which environment. Now we've built a completely new Python package, the mitoinstaller package, that handles the entire installation process. It downloads Jupyter if they don't have it, detects which version of Jupyter they have installed, runs the installation commands for their JupyterLab version and Jupyter notebooks, and finally starts up the Jupyter server with a tutorial notebook. In the success case, users run two commands and then 2 minutes later have already imported data into their first Mito spreadsheet.

That initial friction reduction is important to our target users, who I would describe in two buckets:

1. Target open source adopters. These users are beginner to intermediate Python users that want to / need to write Python for data analysis. Most of the open source users that adopt Mito are already on their Python journey -- we're not teaching them what Python is or what a notebook is in the vast majority of cases. Many of them have gone through Kaggle courses, taken a couple data science classes at school, or are particularly enginuitive. For those beginner users, and even for people like me who have written pandas code for a few years, some things are just much easier to do in a spreadsheet interface, like creating a pivot table or graph (two of our most popular features)

2. Decision makers at large enterprises responsible for moving their company from Excel to Python. Much like us, these decision makers think a ton about the friction of getting employees started with Python. In most cases, they set up JupyterHub (https://jupyter.org/hub) so users don't need to go through any installation processes themselves, and they control things like version controlling, turning notebooks into reports, etc. They generally also offer/require Python training courses, provide template notebooks, and have data scientists available to help the business end users when they get stuck.


Congratulations on all the product development progress!

My wife is a spreadsheet wizard and I’m excited to get her take on this too.


Thanks! Lots of investment in meta-programming in the past 3 months - in our case, code that writes code that writes code :)

Takes from spreadsheet wizards greatly appreciated.


Congrats on the launch!

Random, but: what program did you use to make the intro video? It looks really clean.


Thank you! It's recorded using QuickTime screen cap, and edited in Final Cut Pro. I also made some assets in Figma (e.g. the little spreadsheet grid background).

It took longer than I'd like to admit... but feel validated in spending that time now that you've asked :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: