Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Roe AI (YC W24) – AI-powered data warehouse to query multimodal data
60 points by richardmeng 63 days ago | hide | past | favorite | 35 comments
Hey HN, we’re Richard and Jason from Roe AI (https://getroe.ai). We’re building a query engine that lets data people do SQL queries on various kinds of unstructured data (videos, images, webpages, documents) using LLM-powered data processors.

Here is a 3-minute video: https://www.youtube.com/watch?v=9-WwJk1v5mI, showing how to create an LLM data processor to process videos, build a semantic search for image data, and use it with SQL. The problem we tackle is that data analysts cannot quickly answer their business questions around unstructured, multimodal data. For example, product teams want to understand user session replay videos to understand the painpoints of using their product. Ads teams need to know everything about an advertiser based on their web pages, such as the products they offer, payment methods, etc. Marketing teams need to know how product placement or music in a marketing campaign could get more views. And so on.

For data that is structured, questions like these can be answered quickly with SQL queries in Snowflake / BigQuery. But when you have unstructured multimodal data, it becomes a complex analysis process: open a Python notebook, write custom logic to get these multimodal data from blob storage (or write a crawler first if you need webpage data), find an AI model, do prompt engineering, do data ops to productionize the workload in a data workflow, etc. We simplify this process to a few lines of SQL.

How it works: first, we leverage multimodal LLMs as data processors because they’re good at unstructured data information extraction, classification or any arbitrary tasks. Next, we’ve built a user interface for data people to explore multimodal data and manage AI components. Then we have a quick semantic index builder for multimodal data. (We often see databases provide vector search functionality but not indexing building, so we built that.) Utility functions deal with multimodal data, like video cutter, PDF page selector, etc. Finally, SQL is the command line for slicing and dicing multimodal data.

How we got here: I’ve experienced 3 data evolutions in the last 10 years. At UC Berkeley, I was a data researcher using a supercomputer cluster called Savio. It was a bare-metal way to analyze the data—I had to move CSV between machines. Then at LinkedIn, I had Hadoop + Pig / Scala Spark. That abstracted most of the work, but I spent hours tuning jobs and had a headache manipulating HDFS directories. Later I joined Snowflake, and was like, holy – data analysis can be this simple – I can just use SQL to do everything within this data warehouse! I asked myself: why can’t we make something like Snowflake for unstructured data? That was the impulse behind Roe.ai and it’s been driving me ever since.

To get started, you can sign in at https://app.roe-ai.com/ and there are docs at https://docs.roe-ai.com/. You can load unstructured data via our SQL and File API, Snowflake Staging Data Connector, S3 Blob Storage Data connector, Zapier Roe AI Zap, or the SQL function load_url_file() to get a file from a URL.

Some logistics: the product is free to start, and we’ve preloaded $50 AI credits—enough to process 3000 one-pager PDFs. If you use all $50, just email us, and we’ll give you more. The solution is not open-sourced because it is too complex to be self-hosted, but let us know if you see the potential for open-source.

The product is early and could have bugs and UX problems. It’d be incredible if you could give it a spin anyway and we hope it will be interesting and that you’ll let us know what you think! Jason and I will be around in the thread and are really interested in hearing from you!




Congrats on the launch. Sounds cool and potentially useful, but I don't want to read blog posts or book a demo. I'd put a proper video at the very top of the page instead of the animated typing you currently have.

FYI your <title> tag needs to be updated.


Good points! We'll update our landing pages as you suggested.


Bridging the gap between AI and data warehouses is crucial, but I’m not sure SQL is the best fit for AI engineers who mainly work with Python and AI APIs.

At DataChain, we are solving this by creating a Python API that translates to SQL under the hood, which is pretty easy now with Pydantic. https://github.com/iterative/datachain

WDYT?


Right, our product is designed for data practitioners who want snappy data analytics on unstructured data.

Thanks for sharing your project, super cool idea! What does it take if we want to integrate our SQL engine with datachain?


It uses SQLite in open-source. In SaaS - proprietary data warehouses where your engine can be integrated.


I am glad to see people focusing on this.

If this tool could parse drug patents and draw molecular structures with associated data, I know we would pay 200k/yr+ for that service, and there's a market for it.

In my own field, there's an incredibly important application to parse patents and scientific papers, but this would require specific image=>text models in order to get the required information out with high fidelity. Do you guys have plans to enable user supplied workflows where perhaps image patches can be sent to bespoke encoders, or finetunes?


You can use the https://github.com/iterative/datachain mentioned by @dmpetrov to predict and draw (in SaaS) a molecular structure. Not only can you predict, but you can also - enrich the PDF data with external PDB data, - calculate and evaluate sequence and structure-based predictions made by multiple custom models, - and optimize time and resources.

I created some simple examples in this area a few months ago. Feel free to email me at mikhail@iterative.ai if you're interested in sharing my findings.


Today's large vision models like GPT-4o can parse the content heavy papers pretty well (and respect their structures).

Yah basically it allows you to send PDFs as image patches into GPT-4o model that workflow can be easily built.

Feel free to send me an email richard@roe-ai.com, happy to evaluate your case and try to save that 200K :p


When you say parse - do you mean for prior art or to generate ideas?


I think by parse it means more like document understanding


Congrats on the launch! What are you using to make the LLM understand a video file?

Are you doing transcription + sending frames to a vision or is there a third party service for this?


We use Gemini to analyze the video in its raw format.


Why this when I can just use postgreSQL and pgvector ? Like in this example I found recently: https://www.lycee.ai/courses/91b8b189-729a-471a-8ae1-717033c...


To add to Jason's point -

There is a big UI part here, because for multimodal data analytics, we think it's crucial for people to see and hear data.

For the RAG search, many DBs have built-in vector search, but chunking, indexing, and maintaining the index are kind of on your own. This may not be a problem for technical people, but it's a hassle for data people who own hundreds of data products within a company. Therefore, we have a semantic search index builder that allows one to build an auto-refreshing semantic search index with no code, and completely keep hands free from coming up with their own vectors.

In addition, data analysis often needs to interrogate the search results further. For example, let's say we have used pgvector to find all the photos related to the Golden Gate Bridge. But then we want to interrogate questions like which of these images has someone wearing a blue shirt. We have to apply another model, and that is outside of a normal DB's responsibility.


Great question! The answer is two fold: 1. Not like a vector database, in addition to searching, VolansDB also store the files (pointers) directly in the table. So you are able to manage files (RBAC etc.) as table cells, apply batch data processing jobs easily with SQL, and even unstructured data lineage & pipeline. 2. VolansDB is columnar so it's optimized for analytical use cases rather than for product DB access patterns.


Not saying roe is the next Dropbox but the same sort of thing was said when Dropbox did their show HN…


I don't think OP's question is the same as the one asked about Dropbox. The Dropbox question was about why the service was needed at all when you could use Unix tools like rsync to achieve the same thing. The answer to that was simple: not everyone is a tech-savvy user who wants to mess with command lines. On the other hand, OP's question was about whether there's already a mainstream database and extension that does what Roe AI does. They got a response, and it was helpful. The "You can do this with rsync!" argument became a meme, but now it's almost overshadowed by the knee-jerk "That's what they said about Dropbox!" response.


This is really getting old and abused. If that was the case, Roe would be against at least another 20 dropboxes for AI like Roe only at YC.


Indeed, it is a big factor of survivorship bias. The refrain against Dropbox is notable because Dropbox is the exception, being successful, not the norm. People don't realize how many startups have failed where people echoed the same old HN comment regarding these startups as having been said about Dropbox too.


it's the same as the quote we keep hearing "every big new thing started out looking like a toy". Somehow a bunch of people who should know better (intentionally?) misinterpreted that as every new toy is the next big thing.


Yep, well said. It's hard to understand the arrows of logic if one is not well versed in them.


Congratulations on the launch! I've been researching unstructured data management for some time, and I'm glad new tools have appeared.


This is awesome :) can we use this directly on our entire db?


Likely, can you elaborate on your use case and what db do you use?


Is this more for data engineers or data analysts?

Seems like the type of thing that would be very useful in helping build data pipelines on semi-structured data.


I guess to add to Jason's point, it depends on how data engineers/data analysts are perceived in their roles within the company. For some companies, we see a data analyst taking end-to-end responsibility from the data engineering to BI, but for others we also see a clear separation, data engineers doing data pipelining and data modeling, but data analysts are, in fact, business analysts. Regardless, we think that SQL is the common interface for both of the parties, and we're excited to see who will be the power users.


Right now it's more for data analysts who's data eng team doesn't have the capacity to support all types of data processing requirements. Data analysts can just do it themselves simply with SQL! But we are also open to explore the opportunities for the data eng teams if we see a strong use case of automating their data pipelines.


Why not just focus on the UI part and make it integrate with different data sources?


A lot of infrastructure work is needed to make the SQL experience seamless work for unstructured data. And at the most part we do fork the open core data warehouse and build on top of it.


Will this work with Redshift via SQL interface? Or am I looking at this wrong?


This does not work with Redshift. This is a query engine for unstructured data like documents, images, videos. Those data do not quite fit into Redshift / Bigquery data warehouse.


You are on to something here. Look forward to seeing this evolve.


Thanks!


Why the name? It sounds like it will be about US politics.


The rest of the world thinks of fish eggs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: