Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: MinusX (YC S24) – AI assistant for data tools like Jupyter/Metabase
96 points by nuwandavek 43 days ago | hide | past | favorite | 29 comments
Hey HN! We're Vivek, Sreejith and Arpit, and we're building MinusX (https://minusx.ai), a data science assistant for Jupyter and Metabase. MinusX is a Chrome extension (https://minusx.ai/chrome-extension) that adds an AI sidechat to your analytics apps. Given an instruction, our agent operates your app (by clicking and typing, just like you would) to analyze data and answer queries. Broadly, you can do 3 types of things: ask for hypotheses and explore data, extend existing notebooks/dashboards, or select a region and ask questions. There's a simple video walkthrough here: https://www.youtube.com/watch?v=BbHPyX2lJGI. The core idea is to "upgrade" existing tools, where people already do most of their data work, rather than building a new platform.

I (Vivek) spent 6 years working in various parts of the data stack, from data analysis at a 1000+ person ride hailing company to research at comma.ai, where I also handled most of the metrics and dashboarding infrastructure. The problems with data, surprisingly, were pretty much the same. Developers and product managers just want answers, or want to set up a quick view of some metric they care about. They often don't know which table contains what information, or what specific secret filters need to be kept in mind to get clean data. At large companies, analysts/scientists take care of most of these requests over a thousand back-and-forths. In small companies, most data projects end up being one-off efforts, and many die midway.

I've tried every new shiny analytics app out there and none of them fully solve this core issue. New tools also come with a massive cost: you have to convince everyone around you to move, change all your workflows and hope the new tool has all features your trusty old one did. Most people currently go to ChatGPT with barely any real background context, and admonish the model till it sputters some useful code, SQL or hypothesis.This is the kind of user we're trying to help.

The philosophy of MinusX mirrors that of comma. Just as comma is working on "an AI upgrade for your car", we want to retrofit analytics software with abilities that LLMs have begun to unlock. We also get a kick out of the fact that we use the same APIs humans use (clicking and typing), so we don't really need "permission" from any analytics app (just like comma.ai does not need permission from Mr Toyota Corolla) :)

How it works: Given an instruction, the MinusX chrome extension first constructs a simplified representation of the host application's state using the DOM, and a bunch of application specific cues. We also have a set of (currently) predefined actions (eg: clicking and typing) that the agent can use to interact with the host application. Any "complex action" can be described as a combination of these action-primitives. We send this entire context, the instruction and the actions to an LLM. The LLM responds with a sequence of actions which are executed and the revised state is computed and sent back to the LLM. This loop terminates when the LLM evaluates that the desired goals are met. Our architecture allows users to extend the capabilities of the agent by specifying new actions as combinations of the action-primitives. We're working on enabling users to do this through the extension itself.

"Retrofitting" is a weird concept for software, and we've found that it takes a while for people to grasp what this actually implies. We think, with AI, it will be more of a thing. Most software we use will be "upgraded" and not always by the people making the original software.

We ourselves are focused on data analytics because we've worked in and around data science / data analysis / data engineering all our careers - working at startups, Google, Meta, etc - and understand it decently well. But since "retrofitting" can be just as useful for a bunch of other field-specific software, we're going to open-source the entire extension and associated plumbing in the near future.

Also, let's be real - a sequence of function calls rammed through a decision tree does not make any for-loop "agentic". The reality is that a large amount of in-the-loop data needed for tasks such as ours does not exist yet! Getting this data flywheel running is a very exciting axis as well.

The product is currently free to use. In the future, we'll probably charge a monthly subscription fee, and support local models / bring-your-own-keys. But we're still working that out.

We'd be super stoked for you to try out MinusX! You can find the extension here: https://minusx.ai/chrome-extension. We've also created a playground with data, for both Jupyter and Metabase, so once the extension is installed you can take it for a spin: https://minusx.ai/playground

We'd love to hear what you think about the idea, and anything else you'd like to share! Suggestions on which tools to support next are most welcome :)




This is impressive! We use Metabase and I've been wanting this exact user experience for quite some time. So far, I've been dumping our Postgres schema into a Claude project and asking it to generate queries. This works surprisingly well, save for the tedious copy-paste between the two tabs. The Chrome extension workflow makes perfect sense.

Is there a way to select which model is being used? Anecdotally, I've found that Claude 3.5 Sonnet works incredibly well with even the most complex queries in one shot, which is not something I've seen with GPT-4o.


Haha, yes! We were doing the exact same thing. Also, there is so much context you can't capture with just table schema that you can if you integrate the extension deep into the tool. It also unlocks cross-app contexts (we're working on a way to import context from a doc to a metabase query, or from a sheet/dashboard to a jupyter notebook etc.

> Is there a way to select which model is being used? Not at the moment, but this is in our pipeline! We will enable this (and the ability to edit the prompts, etc.) very soon.

Do try it out and let me know what you think!


I love that you can take a screenshot and it starts to explain what it sees!

While this is clearly an ai analytics assistant your "retrofit" approach certainly differentiates you from existing approaches: https://github.com/Snowboard-Software/awesome-ai-analytics

Not quite sure if this should be a seperate category? It's more similar to the web automation agents like https://www.multion.ai/ than to https://www.getdot.ai/.


We love that feature too and use it quite a bit ourselves!

> Not quite sure if this should be a separate category?

We see ourselves at the intersection of generic browser-automation agents and generic coding agents. MinusX integrates deeply into jupyter/metabase (we had to do a lot of shenanigans to get the entire jupyter app context) and has more context than RPA agents do today. It is possible that eventually all these apps will converge, but we think MinusX will be more useful for anything data related than any of them for the foreseeable future.

To paraphrase geohot, we think that the path to advanced agents runs through specialized, useful intermediaries.


I really like you retrofit analogy - not sure if you coined it or geohot has.

It seems to me that's where a ton of start-ups are currently converging - not repairing the old, which would be too complicated, but understanding and "mending" for new usages, or functionalities.


Thanks! Not sure, I think the term has been in the ether for a while.

Yeah, I see that too. I think for the longest time there was no leverage in doing this sort of retrofitting (except for grammarly type of use cases). But with better intent capture (llms help here), we can actually fix up any existing gaps!


A Grammarly-style retrofit would’ve actually been appropriate here—I made a syntax error.

If you don’t mind, I’ll be stealing and using that analogy!

We were talking about that approach today with a friend who unifies parking apps across the country. He calls his engine UMM—Ultimate Macro Machine.

I’m working in the “classic” generic browser-automation agents with a unified API for meeting bots (transcription, voice input, etc.).


Haha, the analogy is totally yours to use :) Nice, yeah, there is a lot of leverage in building agent-like hooks into current workflows. Even if the agents are pretty mid right now (they are for any complex use case that needs long horizon planning), it's a great place to be in time for the next generation models to drop!


How does the AI know about things like other tables? Does it have some basic knowledge of Metabase’s link structure so it can navigate to a listing of all tables, then pulls context from there for in-context learning while writing the query?

Anecdotally, my hardest problems w/ nl2sql are finding the right tables and adding the right filters.


Yep! MinusX uses Metabase APIs to pull relevant tables, schema, & dashboards to construct the context for your instruction.

> Anecdotally, my hardest problems w/ nl2sql are finding the right tables and adding the right filters.

Totally! especially in large orgs with thousands of tables. Using your existing dashboards and queries, gives useful context on picking the right tables for the query.


XAI! Explainable AI: https://en.wikipedia.org/wiki/Explainable_artificial_intelli...

Use case: Evidence-based policy; impact: https://en.wikipedia.org/wiki/Evidence-based_policy

Test case: "Find leading economic indicators like bond yield curve from discoverable datasets, and cache retrieved data like or with pandas-datareader"

Use case: Teach Applied ML, NNs, XAI: Explainable AI, and first ethics

Tools with integration opportunities:

Google Model Explorer: https://github.com/google-ai-edge/model-explorer

Yellowbrick ML; teaches ML concepts with Visualizers for humans working with scikit-learn, which can be used to ensemble LLMs and other NNs because of its Estimator interfaces : https://www.scikit-yb.org/en/latest/

Manim, ManimML, Blender, panda3d, unreal: "Explain this in 3d, with an interactive game"

Khanmigo; "Explain this to me with exercises"

"And Calculate cost of computation, and Identify relatively sustainable lower-cost methods for these computations"

"Identify where this process, these tools, and experts picking algos, hyperparameters, and parameters has introduced biases into the analysis, given input from additional agents"


This is very interesting. Can we bring our own API keys? Is that in the roadmap?


Yes! Both bring-your-own-keys and local models are on the roadmap. The ETA for both is ~1-2 weeks.


In your demo, you seemed to have performed everything on a small dataset.

How’s the performance on doing the same analysis on a dataset with 1 billion rows for instance?

Also does this work with self hosted Metabase or Metabase Cloud? Or both?


> How’s the performance on doing the same analysis on a dataset with 1 billion rows for instance?

This really depends on whether your tool can handle the scale. We only use a sample of the outputs when constructing the context for your instruction so it should be independent of the scale of the data. We mostly use metadata such as table names, fields, schemas etc to construct the context.

> Also does this work with self hosted Metabase or Metabase Cloud? Or both?

Yep, it should work on both :) We have users across both


While I’m excited about the launch, I’m concerned that your data policies are extremely vague and seem to contain typos and missing parentheticals. As of 12:30p ET they say:

> We have nuanced privacy controls on minusx. Any data you share, which will be used to train better, more accurate models). We never share your data with third parties.

What are these nuanced controls? What data is used to train your models? Just column names and existing queries, or data from tables and query results as well that might be displayed on screen? Are your LLMs running entirely locally on your own hardware, and if not, how can you say the data is not shared with third parties? (EDIT: you mentioned GPT-4o in another comment so this statement cannot be correct.)

https://avanty.app/ is doing something similar in the Metabase space and has more clarity on their policies than you do.

Frankly, given the lack of care in your launch FAQs about privacy, it’s a hard ask to expect that you will treat customer data privacy with greater care. There is definitely a need for innovation in this space, but I’m unable to recommend or even test your product with this status quo.


I totally share your concerns about data (especially data that may be sensitive). We have a simple non-legal-speak privacy policy here: https://minusx.ai/privacy-simplified.

> Are your LLMs running entirely locally on your own hardware, and if not, how can you say the data is not shared with third parties? (EDIT: you mentioned GPT-4o in another comment so this statement cannot be correct.)

We're currently only using API providers (OAI + Claude) that do not themselves train on data accessed through APIs. Although they are technically third parties, they're not third parties that harvest data.

I recognize that even this may just be empty talk. We're currently working on 2 efforts that I think will further help here:

- opensourcing the entire extension so that users can see exactly what data is being used as LLM context (and allow users to extend the app further)

- support local models so that your data never leaves your computer (ETA for both is ~1-2 weeks)

We are genuinely motivated by the excitement + concerns you may have. We want to give an assistant-in-the-browser alternative to people who don't want to move to AI-native-data-locked-in platforms. I regret that was not transparent in our copy.

Thanks for pointing the error in the FAQs, we somehow missed it. It is fixed now!


Very cool. Why is the ai so fast? (Impressive)


We've done a bunch of work to strip down the context and minimise the output tokens (which tends to be 100x as slow as input tokens). GPT-4o is pretty fast too :)


Thanks for the explanation. Can't wait to see the code when you open it up!


This looks cool. Current company uses Metabase extensively and this could be handy. What LLM is being used?


Currently, we're using GPT-4o. We've tested it with Claude as well and plan to roll out support soon!


Any chance of a Firefox extension?


As a Firefox user myself, yes! We plan to launch for other browsers after open sourcing MinusX (in ~1-2 weeks).


What happens when Metabase releases this? (Asking without malice!)


We're building an assistant that works across all your analytics apps. This means MinusX can use context from multiple apps to better fulfil your instructions. You can imagine a future version of MinusX reading data from a spreadsheet, putting it onto a Jupyter notebook / Metabase Table, and running further analysis.

When Metabase (or any other tool) builds an assistant, we aim to use it to further extend MinusX's capabilities!


What other analytics tools do you plan on supporting?


We're currently exploring the tools displayed on our website (Tableau, Grafana, Colab, & Google Sheets). But if you have a specific tool in mind, please do tell us at https://minusx.ai/tool-request


When do you expect Tableau support to be available?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: