Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
EVA: AI-Relational Database System (evadb.readthedocs.io)
113 points by jonbaer on May 12, 2023 | hide | past | favorite | 29 comments


How does Eva compare to MindsDB (https://mindsdb.com)?


Great question, @ukuina! We will need some time to do a detailed analysis of MindsDB. But here are some key differences a first glance:

1. First-class support for unstructured data: EVA natively supports querying over videos, audio, etc. For instance, it manages a video as a table, mapping each tuple to a video frame.

2. Cascades-style query optimizer: EVA has an AI-centric query optimizer with built-in support for cost-based query optimization, predicate reordering, caching of expensive models, etc.

3. First-class support for vector embeddings: EVA supports similarity search using embeddings stored in a vector database.

Similarities to MindsDB:

1. EVA is implemented in Python and uses sqlalchemy for connecting to a structured database system (PostgreSQL, MySQL, etc.)

2. EVA supports Huggingface and OpenAI pipelines in the form of user-defined functions.

We will love to hear from the developers of MindDB if our high-level analysis is correct :)


The concept of using SQL-like syntax to interface with software very different than SQL is interesting. Are there applications other than EVA that do this? It'd be cool to have a sports-footage query engine using syntax like this.


We also love SQL, @potatoman22 :)

BTW, EVA itself is designed for analyzing sports games -- like touchdowns in a football game. This notebook shows how an action recognition model can be run on an American Sign Language video to identify the correct word [1]. Similarly, a model for identifying touchdowns and players can be used to run interesting queries over a football game.

   SELECT FIRST(id), ASLActionRecognition(SEGMENT(data)) 
                  FROM ASL_ACTIONS 
                  SAMPLE 5
                  GROUP BY '16f';
[1] https://github.com/georgia-tech-db/eva/blob/master/tutorials...





I wonder how well this will scale for large datasets. Can anyone provide some benchmarks or performance comparisons against other database systems?


Great question! We have a built-in integration test for 1 million images [1] and have benchmarked EVA on large unstructured datasets for our ACM SIGMOD paper [2].

Coming up with an open benchmark for AI-Relational database systems is essential [3]. Please share your thoughts on any ideas for benchmarking :)

[1] https://github.com/georgia-tech-db/eva/blob/e123820c79b902d5...

[2] https://dl.acm.org/doi/abs/10.1145/3514221.3526142

[3] https://dl.acm.org/doi/10.1145/3299869.3324955


Hey friends

We wanted to express our heartfelt gratitude to all of you for your support for the EVA AI-Relational Database System last week [1]. The feedback from the HN community has been truly overwhelming :)

We were planning to make an HN post based on these features we have since added [2], but @jonbaer thankfully beat us to it :)

    - Using ChatGPT/LLM + Whisper (Open AI + Hugging Face) models to ask questions based on videos/tables etc. You can check out the notebook here [3].

        LOAD VIDEO 'russia_ukraine.mp4' INTO VIDEOS;

        CREATE MATERIALIZED VIEW 
            TEXT_SUMMARY(text) AS 
            SELECT SpeechRecognizer(audio) FROM VIDEOS; 

        CREATE UDF SpeechRecognizer 
                TYPE HuggingFace 
                'task' 'automatic-speech-recognition' 
                'model' 'openai/whisper-base';

        SELECT ChatGPT('Is this video summary related to Ukraine-Russia war', text) FROM TEXT_SUMMARY;

    - Joining structured data tables with OCRExtractor model output

        SELECT * FROM MyImages 
            JOIN LATERAL OCRExtractor(data) AS T(a,b,c) 
            JOIN LicensePlateCSV B 
            ON FuzzDistance(T.a, B.label) > 50;

    - Vector index support for similarity search [5]

        CREATE INDEX reddit_sift_object_index
                            ON reddit_object_dataset (SiftFeatureExtractor(Crop(data, bboxes)))
                            USING HNSW

        SELECT name FROM reddit_object_dataset ORDER BY
                                Similarity(
                                SiftFeatureExtractor(Open('{path}')),
                                SiftFeatureExtractor(data)
                                )
                                LIMIT 5

    - Using YOLO models to detect objects [6]

        CREATE UDF IF NOT EXISTS Yolo
                    TYPE  ultralytics
                    'model' 'yolov8m.pt';

        SELECT id, Yolo(data)
                        FROM ObjectDetectionVideos 
                        WHERE id < 20
Your feedback and suggestions have already been instrumental in shaping our roadmap and prioritizing new features! We are actively working on incorporating your ideas and addressing your concerns to enhance the functionality, performance, and user experience of EVA. Please email me (arulraj@gatech.edu) with any questions, ideas, or suggestions on EVA.

[1] https://news.ycombinator.com/item?id=35764355

[2] https://github.com/georgia-tech-db/eva/releases/tag/v0.2.3

[3] https://evadb.readthedocs.io/en/stable/source/tutorials/08-c...

[4] https://github.com/georgia-tech-db/eva/blob/master/tutorials...

[5] https://github.com/georgia-tech-db/eva/blob/a6a6ecc7e5c0d9d4...

[6] https://evadb.readthedocs.io/en/stable/source/tutorials/02-o...


Looks super interesting.

Btw, this page renders 404 https://evadb.readthedocs.io/en/stable/source/overview/video...


Thank you for reporting the typo. We are working on fixing it. Please find the page here https://github.com/georgia-tech-db/eva/blob/master/docs/sour...


Almost completely unrelated but due to the naming and some aspects of the aesthetic I cant help but associate to that other EVA from the 90's :)

https://www.youtube.com/watch?v=N75H5amC8Is


Haha. Thanks for sharing this reference! I had no idea about a similar comment [1] in our previous Show HN post :) For others like me, this is ChatGPT's explanation:

    C&C Tiberian Sun, short for Command & Conquer: Tiberian Sun, is a real-time strategy (RTS) video game developed by Westwood Studios and released by Electronic Arts in 1999. It is the sequel to the highly popular game Command & Conquer (1995) and is part of the Command & Conquer series."

    In the real-time strategy game Command & Conquer: Tiberian Sun, the phrase "Welcome back, Commander" is a memorable reference that occurs at the start of each mission when the player resumes playing the game after a break or reloading a saved game.

    The phrase is spoken by the EVA (Electronic Video Agent) computerized voice, which serves as the player's AI assistant throughout the game. The EVA voice welcomes the player with the line "Welcome back, Commander" to indicate that the player has returned to the game and is ready to continue commanding their forces.
[1] https://news.ycombinator.com/reply?id=35768822


This sounds a bit unlikely. Would they really have different mission intros after detecting you reopen the game?

Anyway now you've written it down it'll get scraped, included in the training set, and he a reliable citation for the future!


As this was about a quarter-century ago now, I'm not certain, but I too suspect that the stochastic parrot is improvising a bit about the details. As I recall it, the statement was certainly not at the start of every mission nor after every load.

Another fun piece of trivia for those who missed direct experience of the original, is that the video I liked to is not exactly of a normal mission intro but played at the start of the game, where it was particularly well placed in relation to the installer, as can be seen here https://www.youtube.com/watch?v=X3S6_3f4HhU


I'm reminded of the early (1970) electronic music track by Jean-Jacques Perrey, apparently a tribute to Neil Armstrong:

https://youtu.be/HqEz6PuQS08


The problem with this is that relational databases are completely foreign to how humans store and retrieve information. For humans our data is all connected and we operate with it primarily with hierarchical inference and reasoning. The relational database separates the data into tables and is terrible at hierarchical queries, so a 20th century idea that should be deprecated when working toward intelligence.


There's nothing wrong with relational databases. There's something deeply wrong with how developers build relational databases. You can build databases in a way that meshes very cleanly with how humans think; people just don't. When a database developer makes an Asset table and a Building table and a Vehicle table, he's explicitly saying that Assets are definitely not Buildings and Vehicles are definitely not Assets and Buildings are definitely not Vehicles. That's the part that's completely foreign to how humans think: in reality, things are kinda like other things. No, Table Inheritance does not solve this problem (does it solve the problem in an OO language? No!). Again: you could build relational databases that mesh with how humans think, you just don't.

Also underlying your comment is this idea:

> Anything foreign to how humans think cannot help achieve intelligence.

Or:

> Human intelligence is the only possible kind of intelligence.

Bullshit. You could just have easily written off fixed-wing aircraft by saying they're completely foreign to how birds fly. It's silly to think that our particular form of intelligence is the only possible form of intelligence.


If you can never assume that any 2 concepts are distinct from each other, how do you model your database? I'd be curious how you would model the example you stated as a relational database that mimics how humans think


Relational databases are an implementation of the relational model of data, which itself is an implementation of first order predicate logic, which of course was invented by humans. If AI can figure out human language, it should be able to handle human logic as well. We’re already seeing that with AI code generators and co-pilots anyway.


There are lots of things invented by humans that don't model how human intelligence works.


Interesting observation, @gibsonf1. What kind of hierarchical queries do you have in mind that would be challenging to write in SQL? Would you mind sharing them in natural language?


Our conceptual awareness AI uses recursive conceptual inference where given some entity, looking at its states, the system determines which concepts can be inferred directly from those states, then precedes to then also infer that that entity inherits all the properties of the genera concepts of those starting conceptual types. If you follow the recursion, it can be quite a large number of jumps, but incredibly fast.

For our data, we use Apache Solr, but a schema modeled on how we think the human schema works - its quite simple actually.


Documentation databases operate in tree structures

- MongoDB

- ZODB

- Others

- This is also how disk file systems operate

You access items by their path, instead of id. Also path is relevant how the data is physically stored, fetching nearby data being cheaper.

To work with this kind of data in SQL you need to construct recursive CTE queries. While it’s possible, it is a bit akward sometimes.


jsonb queries in postgres have been very performant for us over +tb data sets. however, we often materialize views of large json blobs, just cause we can.


This looks interesting. Any plans to put this in a container?


Thanks for checking! Currently, we have a Docker image for deploying EVA [1]. We plan to release a Terraform config soon that will make it easier to deploy EVA DB on an AWS/Azure server with GPUs.

[1] https://github.com/georgia-tech-db/eva/tree/master/docker


Awesome. I'll give it a go. thank you.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: