Improvements to data analysis in ChatGPT

mpeg · 2024-05-16T23:06:25

I like the concept, but I'm not sure I like the implementation.

The demo they show has an excel sheet source being transformed into another sheet by grouping by a specific column. I don't like how it does a transformation step, sends a new transformed file and then switches the display to this new file. I worry that this loses the history of what exactly has happened to the file, so you're letting a black box run an unknown transformation and trusting the result to be correct.

A better implementation would require a custom UI for this, similar to existing data wrangling tools where every action is logged so that it's clear what has happened to the file and the steps can be checked transparently and rolled back if needed. As it exists now I would find it hard to trust it

anotherpaulg · 2024-05-17T01:17:20

This is why I like to use LLMs to write the code to manipulate my data, draw my graphs, etc. That way I end up with the data/graph/etc artifact, but I also have the code that created it tucked away safely in my repo. So I can tweak and improve that code over time, either on my own or with the help of AI.

Here's an example where I recently used aider and GPT-4o to plot a graph.

https://aider.chat/2024/05/13/models-over-time.html

The graph itself is kind of interesting. It shows how LLM code editing skill has been changing over time as new models have been released by OpenAI, Anthropic and others.

krainboltgreene · 2024-05-17T01:38:17

> This is why I like to use LLMs to write the code to manipulate my data

One of the nice things about human programmers is that you can derive intent and create responsibility. Sometimes we have to encode that in a commit message or ticket, but it's there. When you find out that subtly the program was piping all data into devnull without realizing it you at least have a human to figure out how you got here. Another example of this is the xz debacle.

What are you supposed to do when that's a thousand layers deep? What about when your next generation of programmer's ability to do things is exceptionally stunted by this effort?

weitendorf · 2024-05-17T02:37:06

I'm explicitly working on this in my startup's product (a GenAI for code product).

The obvious answers: record the human's intent in the form of their prompt, and record the LLM's raw output (if you use a conversational LLM out of the box, it almost always includes this even if you explicitly prompt it not to, lol). Of course, depending on your UX this may or may not work. For autocomplete there is no obvious user intent.

There are additional approaches which I'm exploring that require more intentional engineering, but essentially involve forcing "structure" so that more intent gets explicitly specified.

krainboltgreene · 2024-05-17T15:21:18

To be clear you're not working on what I pointed out, you're just doing the same thing. The prompt "may" encode intent, but that has no bearing on what code gets written or stored or changed.

Think about this as well: You're creating processes that have no reasonability. I am 100% responsible for all code I write, even if I wrote it wrong, but if the code your tool generates is wrong it's not my fault, I didn't write it. Multiply this by the hundreds of thousands of times this will happen in a given year and by each employee.

Frankly, you should re-evaluate if you even want your product in the world. What kind of future hellscape are you enabling?

Hansenq · 2024-05-17T00:01:55

The way ChatGPT Data Analysis works is that ChatGPT generates python code that does the transformation step. Because python code is static and deterministic, you can always re-run the code to get the exact output over and over again. If you want it rolled back, just re-run the code, but run fewer lines of it.

In fact, this is probably more accurate than custom UIs that log actions you take, since to build that UI, engineers need to make sure each action is logged (and it's easy to forget to log one!).

andy99 · 2024-05-16T23:59:41

Going to turn out like every other no code / low code tools, works great for a demo of something you'd never do, customizing to a real application is just as much work as coding it

hermitcrab · 2024-05-17T08:01:04

No code / low code tools are widely used in industry for ETL, data wrangling and data analysis. It's a multi billion dollar industry. Of course, they aren't the best tool for every job.

mritchie712 · 2024-05-17T11:19:23

(note: I'm founder of Definite)

I think we're closer to what you're describing. All transformations in Definite[0] are done with SQL which you can easily inspect. The transformations are stored in your data model so everyone else can reuse them.

The setup we normally see at a company is one (or more) data people control the model which can be used by the rest of the company thru our AI assistant. Quick demo here: https://youtu.be/p6BoqX0cYnU?si=hg2V_KM8ScUzwrDx&t=147

0 - https://www.definite.app/

bhl · 2024-05-16T23:52:27

To collaborate with LLMs with text editors and spreadsheets, yeah I do think we will need deterministic declarative primitives that both users and LLMs can use. Getting those primitives right would also get us generalized autocomplete for apps.

zarathustreal · 2024-05-16T23:55:41

V0 by Vercel does essentially this using the RadixUI components as the declarative primitives to generate components

bhl · 2024-05-17T00:26:31

For a generative and collaborative web app, you would have to hook up those declarative components to function calls which are also declaratively written and perhaps use OT/CRDTs to mutate state.

ramoz · 2024-05-16T23:10:38

File lineage doesn’t seem like a complex feature addition here.

mpeg · 2024-05-16T23:29:53

Yes and no, apart from the UI for it it really depends on how they've implemented the transformations. For example, if it's writing and running python behind the scenes which I suspect is what is happening here as that's what the old ChatGPT data analysis did that would not work very well without having to read the code which defeats the point of it being no-code...

You'd want deterministic steps, like it would need to have a limited list of functions to choose from that do specific things eg "group_by()" "filter()" etc so that the code it runs is always the same for it.

StarlaAtNight · 2024-05-16T23:47:04

Might make sense for them to anchor to ibis for the code part (since it compiles down to SQL or Pandas) - and, being inspired by tidyverse, could easily translate to R

mbforbes · 2024-05-16T23:29:10

I'm interested by the first pull quote:

> "...analyzing customer data, which has become too large and complex for Excel. It helps me sift through massive datasets..."

The emphasis here is on the size of the data, but I think that's a red herring. Excel can certainly handle way more data than you're going to want to repeatedly pipe through an LLM. I'm betting it's the diversity of---or even lack of---schemas. Big piles of tables and unstructured data you'd normally have to normalize and join.

I spent years doing data analysis slowly with Python (pandas), and it took me a long time to accept that getting better at Excel was just more efficient. Hard to accept that code isn't always a task's best interface. Automating some Excel work sounds even better.

simonw · 2024-05-16T23:40:15

You don't pipe the data directly through an LLM with this tool - it writes Python code that manipulates the data.

So this will handle millions of rows of data quite happily. I've used it extensively with SQLite in the past - you can upload SQLite databases to it and have it run SQL against them. You can also ask it to save data to SQLite for you to download.

mbforbes · 2024-05-17T00:48:07

Thank you, I completely misunderstood that's how it works. I'm guessing then it feeds in the metadata and writes code based on that.

levocardia · 2024-05-17T00:26:21

In the year of our lord 2024, Microsoft Excel is physically incapable of handling more than ~1 million rows or ~16,000 columns. Both are trivially easy to exceed even in very small applications or toy problems. ChatGPT does not ingest the rows of your spreadsheet as tokens, it just uploads the file and runs Python directly on top of it.

tomnipotent · 2024-05-17T00:59:51

Excel Data Model has been around for years (2013?) and can handle ~2bln rows across a maximum of 2bln tables or 2bln columns. I don't know what that means in the context of ChatGPT, but the capability exists.

serjester · 2024-05-16T23:44:16

Can you elaborate on what you found Excel better for?

mbforbes · 2024-05-17T00:54:11

Nothing capability-wise! Just speed and ease of manipulation.

Maybe an analogy is doing image processing. One could write filters and other manipulations in code, or use something like Photoshop. It turns out Photoshop is a great interface for many of those tasks. I think the same thing is true for more data analysis than I'd realized because of my poor Excel skills.

Still want to improve my spreadsheet abilities though. There's a fun video called You Suck at Excel by Joel Spolsky I'm halfway through [1]. Would love to know of more like this.

[1] https://www.youtube.com/watch?v=JxBg4sMusIg

wuj · 2024-05-16T23:23:53

It starts to feel like a trend where OpenAI is integrating features that were previously implemented by GPT-wrapper startups into ChatGPT. While these startups have added value by enhancing user experience, the trajectory is leading towards an ecosystem where these functionalities seamlessly integrate. The future will be challenging for those startups.

paxys · 2024-05-17T00:40:32

No need to speculate on this.

https://the-decoder.com/sam-altman-explains-why-openai-might...

> OpenAI CEO Sam Altman has a clear message for startups developing products based on OpenAI's GPTs: They should assume that the models will improve drastically with each new release, rather than relying on the current state of the technology.

> Altman uses GPT-4 as an example: Any company that builds something based solely on GPT-4 is likely to be surpassed by GPT-5 if it is as big a leap over GPT-4 as GPT-4 was over GPT-3. Those companies will be "steamrolled" by OpenAI, he says. "Not because we don't like you, but because we have a mission."

vanjajaja1 · 2024-05-17T00:54:46

> "Not because we don't like you, but because we have a mission."

what a phrase, will be borrowing it

RobertDeNiro · 2024-05-16T23:33:15

It always seemed like loosing bet to implement a startup purely on top of the OpenAI apis. If you do that you have zero moat.

ChildOfChaos · 2024-05-17T09:04:10

I remember even Sam said this, he said something along the lines that if you are just building something that is effectively a missing feature of ChatGPT or one of there products, they are going to end up replacing you, so you need to be building something more significant.

zarathustreal · 2024-05-16T23:56:55

Proprietary data is the moat in those cases, unless you’re literally a wrapper

pseudosavant · 2024-05-17T00:58:14

That is the conclusion I’ve come to with all of my AI ideas so far. Easy to be replaced by a feature in ChatGPT or Copilot. Hard to create a meaningful moat.

alvah · 2024-05-17T00:52:36

The founders of Conversion.ai / Jarvis / Jasper / whatever they're called today did OK out of it though.

ttul · 2024-05-17T00:00:19

Furthermore, it’s increasingly clear that OpenAI is doing a “bottom-up” challenge to Microsoft and Google. I would not be surprised at all if OpenAI launches an email service to compete with Gmail, imbued with a fine-tuned model that is optimized specifically for working with email. And then a document editor… and a spreadsheet… etc. There is huge money in productivity software. Microsoft 365 generates the bulk of the “cloud” revenue on Microsoft’s P&L. IIRC, worldwide revenue from Google Workspace and Microsoft 365 - and whatever other minnows can survive underneath them - is supposed to reach $40B by 2030. I apologize for not providing the source.

wuj · 2024-05-17T00:14:28

I think it will be less of a replacement and more like a partnership, per se. It will be hard for OpenAI to challenge services like Gmail due to the network effect. Same with Microsoft 365: People are used to that ecosystem. The success of partnerships hinges on whether Microsoft and Google can develop their in-house models and integrate them into their core products. OpenAI's partnership with Apple was a successful example of this strategy.

copperx · 2024-05-17T01:22:36

Network effect? OpenAI is the fastest adopted product ever. It's the other companies who have to worry about the network effect.

diatone · 2024-05-17T02:26:41

A high rate of customer acquisition isn’t the same thing as a network effect.

OpenAI has some network effects baked into its product suite, but it’s not even in the same league as Microsoft’s bundling strategy (setting aside Google for a moment). Microsoft’s history is littered with competitors who had a better product, a faster adopted product, who were snuffed because of Microsoft’s superior distribution capabilities.

I’m on the train at the moment, but the big exception is the mobile market, and in hindsight that makes perfect sense - most of the phone market consumer grade, where bundling productivity software isn’t much of a value proposition.

It’s far more likely that OpenAI remain where they are on the value chain, because it’s easier to capture and integrate AI startups into their products. If they were to compete on productivity software, they need to differentiate with entirely new modalities of AI-informed user interfaces, or they will be yet another slightly better program that got blown out of the water by MSFT enterprise distribution

edit: to be clear my point is, why would you become a bit player in an established market when you can dominate a new market you created yourself?

Hansenq · 2024-05-17T00:04:29

I think OpenAI is targeting clear enterprise use-cases for what they're building into ChatGPT. Data Analysis is a clear enterprise use-case. So, if a feature helps them sell ChatGPT Enterprise, I think they'll build it, since that's a large revenue driver for that.

Consumer-focused wrapper startups like jenni.ai, research helpers, or math tutors I doubt they'll focus on, since most of the revenue is in enterprise.

wuj · 2024-05-17T00:20:31

Absolutely. However, as OpenAI forms partnerships and begins to offer ChatGPT as a plugin across various platforms, many of those niche applications would be replaced.

ashu1461 · 2024-05-17T01:33:32

I was looking at Vertex AI agents today, their implementation of agents is also like how 100 other startups have done it, no innovation nothing. Same copy pasta stuff

hermitcrab · 2024-05-17T08:06:32

Building a product on top of someone else's platform is always risky. Especially if your product is only a thin veneer.

nbzso · 2024-05-16T22:39:53

I will ask again: The corporate data in the cloud? Sells, relationships, confidential information. I am crazy, or the world is getting ridiculous? First killing copyright. Now what? Privacy? Security?

jwrallie · 2024-05-16T23:07:29

My guess is that most people making the decision to use ChatGPT on confidential data are low level employees looking to reduce their workload, and their bosses are happier with the improved performance (with the same salary).

Going on a tangent, the other day a friend of mine was explaining how he uses Google Drive to copy files from an iPad to a Windows laptop. He is trying hard to make sure three letter agencies have an extra copy of his data, just in case.

meowkit · 2024-05-16T23:09:40

This is why Confidential VMs have been a cloud priority.

https://learn.microsoft.com/en-us/azure/confidential-computi...

https://cloud.google.com/confidential-computing/confidential...

Hypervisor cannot read its guests data.

skissane · 2024-05-17T00:37:11

What happens if there is a warrant or subpoena?

Is there some kind of "backdoor" to enable the cloud provider to grant law enforcement/intelligence/etc access? Or not?

From what I understand, theoretically with something like AMD SEV it could be configured so no such "backdoor" exists. However, is it actually set up that way in practice?

jonas21 · 2024-05-16T22:47:42

Huh? Almost everyone stores corporate data in the cloud and has for years.

threeseed · 2024-05-16T23:33:12

Most enterprises still have their core systems either on-premise on in private clouds.

And the trend has been towards standardising on Kubernetes to allow for the hybrid model to work.

nbzso · 2024-05-16T22:54:49

So I am the only one stupid enough to have a private cloud? And the biggest and most successful corporations have trusted their intellectual property to Microsoft and Google? Boy, I am living under the rock. It is time to break all the walls. Right?

cloudking · 2024-05-16T23:02:24

Microsoft and Google have thousands of security engineers focused on protecting your data.

neffy · 2024-05-16T23:19:50

This would be the same Microsoft that can´t explain to the US State Department how there was a golden key floating around that allowed their email to be hacked?

https://techcrunch.com/2023/07/17/microsoft-lost-keys-govern...

milemi · 2024-05-16T23:06:39

Even from you https://www.theguardian.com/australia-news/article/2024/may/...

cooper_ganglia · 2024-05-16T23:11:35

Oh, well in that case, they can access whatever data they'd like!

mewpmewp2 · 2024-05-16T23:24:58

The calculation considers potential financial damage from having a cloud based security issue and most will find that the financial damage * risk < efficiency gains from just using cloud where everything is handled for you and you can just focus on your work.

a_wild_dandan · 2024-05-16T23:18:56

People make different decisions due to different preferences. There’s no need for melodramatic character judgements here, relax lol. You’re not stupid. People who chose differently aren’t either.

throwaway98797 · 2024-05-16T22:56:23

if it’s not hard for you why change?

if it is, maybe time to reconsider.

vikramkr · 2024-05-16T23:30:44

Uhh, I guess yes? There's a reason that pretty much the entire internet goes down every time AWS has an outage lol.

VWWHFSfQ · 2024-05-16T22:52:39

Microsoft already has everyone's corporate data. Always has.

wddkcs · 2024-05-16T22:49:17

Privacy has always been dead. Idk why people love it's corpse so much.

chipgap98 · 2024-05-16T22:20:20

It will be interesting to see if ChatGPT can drive meaningful differentiation through features like this. Why would a business use this solution as opposed to native solutions from Microsoft and Google?

7thpower · 2024-05-16T23:06:38

Office copilot sucks, except for Teams, which is great.

The rest is vaporware, basically. Obviously it will get a lot better, but this is refreshing.

Hopefully Microsoft has something better to show at Build that isn’t a bunch of half baked tools that will… maybe be better in two years.

taberiand · 2024-05-16T22:59:02

Given the level of partnership between Microsoft and OpenAI, wouldn't this just be the same feature offered in different ways been ChatGPT and Office 365?

billy_bitchtits · 2024-05-16T23:15:32

Can copilot in office 365 do this? I have that at work and has chat gpt at home and find chat gpt to be way better at data analysis. I can’t see how to get copilot to do it outside of manipulating excel.

jnnnthnn · 2024-05-16T22:24:10

> Instead of downloading files to your desktop and then uploading them to ChatGPT, you can now add various file types directly from your Google Drive or Microsoft OneDrive. This allows ChatGPT to understand your Google Sheets, Docs, Slides, and Microsoft Excel, Word, and PowerPoint files more quickly.

As far as I know, this is the first integration of other cloud products in ChatGPT that's done as a 1st party integration (vs. "GPTs", which are all 3rd party)?

lelandfe · 2024-05-16T22:28:18

Depends on what "cloud products" means, I suppose, but Browse with Bing?

jnnnthnn · 2024-05-16T22:30:59

True! I had "things that store my data in the cloud" in mind.

Jimmc414 · 2024-05-16T22:39:32

Looks like it will be nice when they work some of the kinks out. I'm now getting an internal error with the environment 'AceInternalException' when trying to display data in a pandas dataframe to the console. I looked around for about 5 minutes for a link or route to properly report the reproduction steps as a bug but didn't find anything and gave up.

carlosbaraza · 2024-05-17T10:46:03

One interesting point of this feature is that it acknowledges that natural language is not a sufficiently good interface, we still need custom UI for particular use cases. As long as humans are in the driver seat, multiple UIs for the same problem will be created and used by many, enabling the SaaS ecosystem to thrive with this new LLM tooling backend.

mritchie712 · 2024-05-17T11:03:15

(note: bias because I'm founder of https://www.definite.app/)

Completely agree, chat is not going take over every UI that's been developed over the past decades.

One example: are you going to build a "Company Dashboard" in ChatGPT?

ChatGPT is good for "last mile" adhoc analysis (e.g. you already have all your data in a single file and you want to ask a few questions). But you're not going to build dashboards and standard reports in it just like you wouldn't run your CRM in it.

phillipcarter · 2024-05-16T22:56:26

I've believed for quite a while that data analysis is a much better use of this kind of AI than coding assistants is. Really happy to see OpenAI investing in this space. There's likely loads and loads more improvements to make.

ramoz · 2024-05-16T23:05:18

You mean using code generation in a way that replaces the developer (in this case the data scientist who would be asked to conduct/code the analytics).

moralestapia · 2024-05-16T22:43:37

"OpenAI doesn't have a moat", yet every time they announce something it goes on to trend worldwide.

I'm not even a fan of them, and I dislike the hypocrisy behind the name; and still, ChatGPT has replaced Google for 90% of my queries. It's starting to feel like a burden to have to open Google when ChatGPT doesn't give me what I want for some reason. As soon as they're able to "ground" results, 99% of my queries will go there.

Anyone that doesn't understand this is just coping hard with this new reality.

Bjartr · 2024-05-16T22:52:31

That they are delivering useful capability is orthagonal to whether or not they have a moat. They have first-mover advantage, no doubt, but there's very little stopping others from catching up.

px43 · 2024-05-16T23:07:43

Why do I see moats constantly framed as a good thing? Tech should be about empowering people to do great things, not maximizing the extraction of value from users. Anyone should be able to catch up with anyone else. Anything that prevents that is literally "anti-competitive", which is generally considered to be predatory behavior.

earleybird · 2024-05-16T23:45:41

A moat is good or bad depending on which side of the moat you are on.

greenavocado · 2024-05-16T23:35:52

As temporarily embarassed startup founders the VCs we talk to are constantly asking us about our moat in order to secure investment.

dingclancy · 2024-05-17T00:02:23

If there is very little stopping them, where are they. Google Gemini is the closest right now and they are bungling their launches.

moralestapia · 2024-05-16T23:00:38

>there's very little stopping others from catching up

And yet they don't!

Talk is cheap.

HarHarVeryFunny · 2024-05-16T23:32:50

Claude is same level as GPT-4.

Anthropic have just hired a Chief Product Officer (Mike Krieger - co-founder of Instagram), so you may see more layered on top of the core AI if that is what you are referring to, although they seem to have a different focus (more corporate) to OpenAI, so don't hold your breath for an Anthropic AI girlfriend.

OpenAI seem to be trying a shotgun strategy of productizing GPT any way they can, but I'm not sure that's the best long-term strategy. Maybe better to build an ecosystem - build the OS/APIs, and let 3rd party's build the applications on top of that.

mupuff1234 · 2024-05-16T23:41:21

Seems like Google is mostly caught up.

dingclancy · 2024-05-17T00:05:05

They are as caught up as Bing is to Google Search. It does not really mean anything.

moralestapia · 2024-05-16T23:49:17

Can you show me the Google equivalent of GPT-4o?

mupuff1234 · 2024-05-17T00:17:44

In terms of performance GPT-4o doesn't seem like an improvement over GPT-4 (even worse in some cases afaiu)

And Google showcased the project astra thing, which seems like the equivalent.

moralestapia · 2024-05-17T00:24:52

Google has been "showcasing" since I can't remember, if only they shipped instead :'(.

Remember Duplex [1]? Which then turned out to be fake. The more recent fake Gemini demo? What an embarrassment.

1: https://www.youtube.com/watch?v=D5VN56jQMWM

mupuff1234 · 2024-05-17T00:29:15

How is duplex fake? It's available.

moralestapia · 2024-05-17T03:06:35

https://www.engadget.com/2019-05-22-google-duplex-is-made-of...

mupuff1234 · 2024-05-17T03:23:25

So initially the quality wasn't 100%, how is that fake? Over selling of capabilities, sure.

LLMs also aren't correct 100% of the time, far from it.

tempaccount420 · 2024-05-16T23:40:28

Until this gets implemented in an open source AI chat UI, and now it doesn't matter what model you're using it with.

dingclancy · 2024-05-17T00:00:56

What open source AI chat UI exists now that has the level of usage of ChatGPT?

I love open source but their very nature suggests there is no one open source model that will have the critical mass to just be a straight in replacement for ChatGPT and its UI.

I use Ollama and choose models and they are close to GPT-4 but OpenAI is speeding up into making a useful chat product. All the rest are foundational model demos.

chocoboaus2 · 2024-05-17T00:28:01

Just more proof that OpenAI is now a product led company and not a research led company (obviously they will continue to research but that will lead to consumer product).

Big pivot.

worstspotgain · 2024-05-16T23:32:27

I picture every AI founder instantly shivering in cold sweat when OpenAI news break. Deer Hunter, Hayes Valley style.

iamsanteri · 2024-05-16T22:47:05

Isn’t this literally what all sorts of analysts do as their core jobs? Are they getting automated away?

andy99 · 2024-05-16T22:47:22

No. People that boil a job down to a simple cognitive task rarely understand what the actual work of the job is.

ImaCake · 2024-05-17T01:07:28

Exactly. I am a "Data Scientist" for whatever that label is worth. But what I actually do is domain specific to outdoor air quality. These tools help automate the boring stuff - if chatGPT made programming obsolete for me tomorrow I wouldn't lose my job, I would just be able to spend my time doing something more useful. But somehow I doubt it will.

nextworddev · 2024-05-16T23:01:57

You are right, but this tool gives executives a nominal reason to downsize.

zx8080 · 2024-05-16T23:15:37

Is downsizing a thing that execs fear of?

Less budget, less power.

andy99 · 2024-05-16T23:49:10

In practice it doesn't work that way. What's downsized is line workers that exec's treat as fungible and don't care about anyway. Any "savings" goes to more management to run whatever the replacement solution is. Someone like a foreman may see their headcount reduced but they are the only people who would care.

culopatin · 2024-05-17T02:38:38

Is data analysis what I should use if I have multiple files I want to chat about? What if I have a long multipage document? It’s starting to become obfuscated what version I should use for what.

pottertheotter · 2024-05-17T03:07:29

There's no difference these days. It's just part of any other chat. It used to be that you had to start a chat with data analysis. Now you just pick your model (3, 4, or 4o).

hermitcrab · 2024-05-17T08:04:41

If you are using an LLM to write code, because you aren't skilled enough to do it yourself, how are you going to know if what it is producing is correct?

sebastiansm · 2024-05-16T22:35:25

What alternatives exist to the analysis feature of chatgpt?

mritchie712 · 2024-05-17T10:52:49

(note: I'm founder of https://www.definite.app/)

ChatGPT is good for "last mile" adhoc analysis (e.g. you already have all your data in a single file and you want to ask a few questions).

If you're looking for data infrastructure with an AI assistant, try https://www.definite.app/. We give you a full data stack in one app:

1. Built-in data warehouse - We spin up a duckdb database for you

2. 500+ connectors (e.g. Postgres, Stripe, HubSpot, Zendesk, etc.) - You don't need to buy a separate ETL, it's also built-in

3. Semantic layer - Define dimensions, measures, and joins using SQL in one place. We have pre-built models for all the sources we support (e.g. the Stripe model already has measures for MRR, churn, etc.)

4. Simple BI / Dashboards - Build a table with the data you want and generate visuals off that table. Works like a pivot table, if you can use a spreadsheet, you can use Definite.

5. AI assistant to help you thru all of this and analyze your data

I'm mike@definite.app if anyone wants help getting set up.

boyd · 2024-05-17T03:06:31

We’re more focused on data visualization vs. general “chat with your data”, but building in this space: https://minard.ai

Upload or import a CSV, Excel, Google sheet, etc. and also just launched Postgres and Snowflake connectors today!

Hansenq · 2024-05-16T23:55:16

If you're looking for a more powerful version of ChatGPT Data Analysis, I recommend https://julius.ai/.

If you're looking for a version of ChatGPT Data Analysis, but want to offer it to your customers, take a look at us! https://www.lightski.com/ (disclaimer: I'm the co-founder.) We're building this feature of ChatGPT, but in a way that you can embed inside your app, so that your end-users can get an AI Data Scientist without needing to export data, then import it into ChatGPT or Julius.

andenacitelli · 2024-05-17T02:34:50

Akkio has a pretty good chatbot for this as well as a much longer tail of data transformation and predictive modeling / AutoML features.

Disclaimer: I work at Akkio, but think we have a really nice, value-add product and an excellent team behind it :)

cstanley · 2024-05-16T22:53:49

https://patterns.app/, but is more for databases

loumaciel · 2024-05-16T23:14:51

https://www.findly.ai/ works with databases.

jnnnthnn · 2024-05-16T22:37:26

https://julius.ai is great

CephalopodMD · 2024-05-16T23:19:21

Google just announced their version of this on Tuesday with Gemini

skadamat · 2024-05-16T23:33:29

I love that the "Authors" section just lists "OpenAI". For a split second, I thought it was written entirely by ChatGPT :)

jp0d · 2024-05-17T00:30:32

Yes, looks great for small businesses and individuals. I don't any big enterprise would handover their confidential data to OpenAI.

numbers · 2024-05-16T23:48:14

hmm, the current charts generated are pretty basic and kinda ugly so I'll continue using Excel/Sheets but I'm hoping this can be used to generate charts out of tables for presentations.

fblp · 2024-05-16T23:36:27

I'd love to see this for finance statements and data.

Is anyone working on this?

rohitghumare · 2024-05-16T23:44:07

I hope the demo will be the same as the actual implementations.

m3kw9 · 2024-05-16T23:23:45

You could already do that if you don’t need google sheets

autokad · 2024-05-16T22:51:44

you don't need this though, you can just ask chat gpt to give you python code for running general data analysis

7thpower · 2024-05-16T22:56:00

Not everyone knows how to python or wants to deal with the frustration of learning it. This is a big deal for a lot of people, especially corporate workers who don’t have access rights to run code in the first place.

sv123 · 2024-05-16T23:55:56

It is so much easier to do it in a chat interface and iterate on the exploratory analysis... copy/paste back and forth from a notebook is terrible.

ramoz · 2024-05-16T23:08:17

The point is they’re headed in a direction that erases the need for someone who must work with code in order to understand analytics

VWWHFSfQ · 2024-05-16T22:55:36

sure, but I'm super lazy. I just want the chart or table or whatever so I can paste it in an email and then go home.

hooloovoo_zoo · 2024-05-16T23:23:15

Until the LLM has more to go on than the column name, all these analytics AI tools are doomed to failure for anything nontrivial.

vikramkr · 2024-05-16T23:29:48

I mean, if you're going to doom these tools to failure you should probably pick a higher bar than 'can look at more than column name' since that's not going to take very long at all to happen at this rate of progress.

hooloovoo_zoo · 2024-05-16T23:37:08

You are way underestimating the difficulty there. Most often the actual column does not have a detailed description somewhere so the llm is going to have to understand your entire codebase including whatever is logging the column and what that log actually means. Open AI has been trying to do this for 5+ years in various forms.

vikramkr · 2024-05-16T23:45:52

Honestly for companies that are really on board with the AI stuff they'll probably start shaping their data and processes to work well with these tools. And the whole 'understand your entire codebase and the meaning of the log' thing is both what these models area already decent at and also not really relevant here? I double checked the announcement and I don't see anything about code or logging? Seems to be pretty straightforwardly a data analysis tool to work with the standard data formats (excel, google sheets) that the vast majority of people use.

hooloovoo_zoo · 2024-05-17T00:03:48

Suppose you have some natural language analysis question like “how many users clicked on x product?”. In the demo, there’s always a ‘clicks’ table and the LLM just gets it right. In the real world, that product has multiple links on multiple platforms some of which represent a prefetch and also that product really comes in multiple skus so do you want all of them bla bla bla. It would already be helpful without AI to better document how a table works but it doesn’t happen and there are already better ways to document tables than natural language which nobody uses so I doubt businesses are just going to suddenly be great at documentation. The LLM will get the wrong analytic answer because it doesn’t know how anything works, which makes the analytic tools not very useful.

vikramkr · 2024-05-17T01:07:50

So figuring out and reasoning through that mess is pretty much the specific thing that LLMs are good at, I'd recommend actually trying the newer models if you're still operating on the assumption that these are at GPT 3 level. Random (probably cherry picked) twitter thread but it seems to be doing fine for the type of stuff that makes up 99% of spreadsheet use as far back as april 2023: https://twitter.com/emollick/status/1652170706312896512

That real world example also doesn't seem representative of the bulk of real world use of spreadsheets. Even in the announcement they're focusing on simple pivot tables and creating presentations. I don't see any reason why this wouldn't be able to handle asking questions on a data dump from quickbooks or the like. I get the feeling that you might be operating in a very unusual context, like in tech or finance, where you might even have professional data analysts who are hired to work with data. That's probably the biggest market revenue wise for this stuff but in terms of the number of users, spreadsheets are used EVERYWHERE

hooloovoo_zoo · 2024-05-17T02:57:10

I don't agree that they are good at it, but even if they are I don't think the models even have the information needed to reason, they just have a very short abbreviation. Even in that very simple example the model is screwing up, running regressions determined entirely by one outlier and not even copying the R^2 to the text correctly. So now someone has to go back and fix the model's work.

Spreadsheets ARE everywhere, but they're even harder to connect back to the source of the data because their proximal source is some e-mail message. There just isn't enough information in the sheet to tell the model what the data actually means.

jmpeax · 2024-05-16T23:37:05

"Hey there gorgeous! Do you want me to analyse your tables? ;)"