What's Postgres got to do with AI?

rattray · on Feb 24, 2023

This is about how to build a recommendation engine into postgres by storing embeddings from OpenAI in a table and querying them with pg_vector.

dormento · on Feb 24, 2023

Thank you for your sacrifice.

(apropos: now there's an AI feature I can get behind. Automatic summarization of links, so we can filter the clickbait).

mathgorges · on Feb 26, 2023

You would probably appreciate Kagi’s Universal Summarizer: https://labs.kagi.com/ai/sum

andrewstuart · on Feb 24, 2023

I don’t know.

But I do know that whatever the question, Postgres is the answer.

revskill · on Feb 24, 2023

Answer belongs to the user. Postgres is just one good tool.

pjmlp · on Feb 24, 2023

Distributed transactions across a database cluster?

rastignack · on Feb 24, 2023

You can do it with extensions. However most of the time you are not going to need it.

pjmlp · on Feb 24, 2023

Ah so it isn't the answer for everything.

valenterry · on Feb 24, 2023

Downvoted for blasphemy

valenterry · on Feb 24, 2023

Replying to myself so that you can pay me back

johnthescott · on Feb 24, 2023

true.

what IS the answer for everything, by the way?

voytec · on Feb 24, 2023

CuriouslyC · on Feb 24, 2023

Foreign data wrappers.

"During a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction. The remote transaction is committed or aborted when the local transaction commits or aborts. Savepoints are similarly managed by creating corresponding remote savepoints."

riku_iki · on Feb 24, 2023

does it support two-phase commit? What if local transaction succeeded, and remote failed? Cluster will have inconsistent data?..

CuriouslyC · on Feb 24, 2023

The mental model is slightly different. FDWS are basically server -> server linkages, not literally clustering. As a result, data consistency between servers is less of an issue - the data is likely to only live in one place (read replicas and failovers notwithstanding). There may be some error handling and try/rollback around remote transactions baked into postgres_fdw but I'm not familiar enough to say.

johnthescott · on Feb 24, 2023

the citus extension might be your friend.

stuckinhell · on Feb 24, 2023

An interesting pr piece about building a recommendation engine. I fully believe this a PR piece to drive traffic because postgres has very little to do with AI beyond being a database.

My firm looked into Postgres extremely deeply aka source code around text search and AI. I would highly recommend against using this, and stick to more standard ways of doing NLP. Postgres still has open bugs around text search and lifting limits from 5 years ago. We concluded we couldn't trust anything but the core ecosystem around it.

johnthescott · on Feb 24, 2023

the rum index extension we use for text search of +TB sized document databases.

stuckinhell · on Feb 24, 2023

Were you able to work around the positional limits for tsvector ?

riku_iki · on Feb 24, 2023

GIN index (rum uses it) creation is not parallel and can be done in single thread only. Does it take forever to index your TB of documents?

sfkeller · on Feb 24, 2023

Couldn't you implement this in "pure" SQL by using psql-http https://github.com/pramsey/pgsql-http (from Crunchy ) for the webservice calls to OpenAI API?

sfkeller · on Feb 24, 2023

See also https://news.ycombinator.com/item?id=34684593

l5870uoo9y · on Feb 24, 2023

Another way that I am experimenting with is to combine AI and Postgres to build SQL queries and then run them directly. In this way users can gain valuable data insights (essentially building data dashboards themselves) without bugging the data analyst [1]. Still in beta.

And as the article notes "Postgres is equipped" to query, format and return the data needed.

[1]: https://aihelperbot.com/videos/preview-use-ai-to-become-your...

boredemployee · on Feb 24, 2023

I made an app like that for myself using OpenAI (codex). While it works pretty good for basic stuff (near 100%) it fails hard when you need some more advanced queries.

ie.: I asked it to write a query to check how many times "user A" triggered "event X" before triggering "event Y" (with a column of their timestamps).

Never returned the right answer.

l5870uoo9y · on Feb 24, 2023

Something like this: https://aihelperbot.com/snippets/cleitmbiq0018mf20pr281yau

Beyond the prompt as you see above, I added the events table with id, userName, eventType and date field. Adding your database tables gives (in my experience) high accuracy.

est · on Feb 24, 2023

see also https://news.ycombinator.com/item?id=34684593

janalsncm · on Feb 24, 2023

For large numbers of vectors you will probably want sub-linear search time. HNSW for example. Does pg_vector support this?

ac2u · on Feb 24, 2023

I've been doing the cosine similarity in plain ruby while I await the availablity of pg_vector on AWS RDS, scales well enough for small datasets for prototypes(talking less than 100 here, although haven't load tested it yet).

tabtab · on Feb 25, 2023

AI and tables? Factor Tables! https://github.com/RowColz/AI

bobosha · on Feb 24, 2023

how does pg_vector scale beyond some toy examples?

baudehlo · on Feb 24, 2023

It's indexable, so just fine: https://github.com/pgvector/pgvector#indexing