Hacker News new | past | comments | ask | show | jobs | submit login

PGvector is very nice indeed. And you get to store your vectors close to the rest of your data. I'm yet to understand the unique use case for dedicated vector dbs. It seems so annoying, having to query your vectors in a separate database without being able to easily join/filter based on the rest of your tables.

I stored ~6 million hacker news posts, their metadata, and the vector embeddings in a cheap 20$/month vm running pgvector. Querying is very fast. Maybe there's some penalty to pay when you get to the billion+ row counts, but I'm happy so far.




As I'm trying to work on some pricing info for PGVector - can you share some more info about the hacker news posts you've embedded?

* Which embedding model? (or number of dimensions) * When you say 6 million posts - it's just the URL of the post, title, and author, or do you mean you've also embedded the linked URL (be it HN or elsewhere)?

Cheers!


You can also store vectors or matrices in a split-up fashion as separate rows in a table, which is particularly useful if they're sparse. I've handled huge sparse matrix expressions (add, subtract, multiply, transpose) that way, cause numpy couldn't deal with them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: