Hacker News new | past | comments | ask | show | jobs | submit login
What's Postgres got to do with AI? (crunchydata.com)
114 points by craigkerstiens on Feb 24, 2023 | hide | past | favorite | 31 comments



This is about how to build a recommendation engine into postgres by storing embeddings from OpenAI in a table and querying them with pg_vector.


Thank you for your sacrifice.

(apropos: now there's an AI feature I can get behind. Automatic summarization of links, so we can filter the clickbait).


You would probably appreciate Kagi’s Universal Summarizer: https://labs.kagi.com/ai/sum


I don’t know.

But I do know that whatever the question, Postgres is the answer.


Answer belongs to the user. Postgres is just one good tool.


Distributed transactions across a database cluster?


You can do it with extensions. However most of the time you are not going to need it.


Ah so it isn't the answer for everything.


Downvoted for blasphemy


Replying to myself so that you can pay me back


true.

what IS the answer for everything, by the way?


42.


Foreign data wrappers.

"During a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction. The remote transaction is committed or aborted when the local transaction commits or aborts. Savepoints are similarly managed by creating corresponding remote savepoints."


does it support two-phase commit? What if local transaction succeeded, and remote failed? Cluster will have inconsistent data?..


The mental model is slightly different. FDWS are basically server -> server linkages, not literally clustering. As a result, data consistency between servers is less of an issue - the data is likely to only live in one place (read replicas and failovers notwithstanding). There may be some error handling and try/rollback around remote transactions baked into postgres_fdw but I'm not familiar enough to say.


the citus extension might be your friend.


An interesting pr piece about building a recommendation engine. I fully believe this a PR piece to drive traffic because postgres has very little to do with AI beyond being a database.

My firm looked into Postgres extremely deeply aka source code around text search and AI. I would highly recommend against using this, and stick to more standard ways of doing NLP. Postgres still has open bugs around text search and lifting limits from 5 years ago. We concluded we couldn't trust anything but the core ecosystem around it.


the rum index extension we use for text search of +TB sized document databases.


Were you able to work around the positional limits for tsvector ?


GIN index (rum uses it) creation is not parallel and can be done in single thread only. Does it take forever to index your TB of documents?


Couldn't you implement this in "pure" SQL by using psql-http https://github.com/pramsey/pgsql-http (from Crunchy ) for the webservice calls to OpenAI API?



Another way that I am experimenting with is to combine AI and Postgres to build SQL queries and then run them directly. In this way users can gain valuable data insights (essentially building data dashboards themselves) without bugging the data analyst [1]. Still in beta.

And as the article notes "Postgres is equipped" to query, format and return the data needed.

[1]: https://aihelperbot.com/videos/preview-use-ai-to-become-your...


I made an app like that for myself using OpenAI (codex). While it works pretty good for basic stuff (near 100%) it fails hard when you need some more advanced queries.

ie.: I asked it to write a query to check how many times "user A" triggered "event X" before triggering "event Y" (with a column of their timestamps).

Never returned the right answer.


Something like this: https://aihelperbot.com/snippets/cleitmbiq0018mf20pr281yau

Beyond the prompt as you see above, I added the events table with id, userName, eventType and date field. Adding your database tables gives (in my experience) high accuracy.



For large numbers of vectors you will probably want sub-linear search time. HNSW for example. Does pg_vector support this?


I've been doing the cosine similarity in plain ruby while I await the availablity of pg_vector on AWS RDS, scales well enough for small datasets for prototypes(talking less than 100 here, although haven't load tested it yet).


AI and tables? Factor Tables! https://github.com/RowColz/AI


how does pg_vector scale beyond some toy examples?


It's indexable, so just fine: https://github.com/pgvector/pgvector#indexing




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: