Hacker News new | past | comments | ask | show | jobs | submit | more zh217's comments login

Sorry about that ... I will revise it to be more consistent. Cozo is a bit ambiguous, so now it is usually called CozoDB.


The linked article explains these in details.


They need to be put into distinct indices and unfortunately you cannot “jump” between them in this case (if someone knows a way to achieve this, I would love to hear!)


I think I’m understanding that the item’s vector in one LLM can be stored as one index and the vector in another LLM can be stored as a second index without them colliding or one having to overwrite the other.

Is that right?


Yes, actually I already do that. Sbert is better than openai ada embeddings for many use cases.


Amazing. Then as far as I’m concerned that functionally solves the problem until the industry figures out cross-embedding jumps


Great questions!

- As can be seen https://docs.cozodb.org/en/latest/releases/v0.3.html, for concurrent writes about 200K QPS can be achieved with 24 threads on a pretty old server. I think it is enough for a small to medium social network.

- You can start independent instances and use them together in your user code. You can have as many as you like, but data can only be exchanged through your code: they can't talk directly to each other.

- If by git-like you mean point-in-time queries, yes that's what the feature is for. But git comes with lots of other things such as merge logic, etc. These need to be implemented outside CozoDB.

- We do use CozoDB for data storage in production systems ourselves, and we back up a lot. So far nothing disastrous has happened. Note that CozoDB does not have any meaningful concept of user/authentication/authorization (yet), so you must make sure that only trusted clients can reach it (only an issue if you use the standalone server, since the embedded DBs do not open any ports).


As a graph “fanboy” I’m impressed, humbled, and inspired by the work that’s been done already and the direction you’re heading!

> Note that CozoDB does not have any meaningful concept of user/authentication/authorization (yet)

Please please please implement the Palantir security model unless you already have a smarter idea coming down the pipe. Palantir regularly scrubs past media from the internet, but there is a blog post that has the ACL slides from the now-private video: https://onetwo.ren/级联GraphQL访问控制/


Did some digging, I found this: https://documents.pub/document/palantir-access-control.html which appears to be the full slideshow


Perfect yes, thank you.


Could this be used for creating a memory system, with weights, and the ability to rewind thought chains? Would you be interested in partnering up ? I'm not a database dev, but I have some great ideas, and I'm already reaching out to investors to build something in AI, and have a partner potentially. I'd love to build a database that is basically like the midbrain of AI, a database hybrid that is built specifically for AI memories, and memory relations. If you're open to collaborating and building a product, perhaps my ideas could be a good 'test case' and be mutually beneficial to all of us. email : patrickwcurl - gmail.


Amazing! Thank you, all very encouraging answers. Congrats on everything you've achieved with Cozo so far!!

One last question if possible. Is there a recommended way to do Full Text Search on data stored in Cozo?


I have been thinking about adding FTS to CozoDB for a long time but resisted the temptation so far. The reason is that text search is language-specific: what works for one language does not work for another. There is simply no way that CozoDB can duplicate the work of a dedicated text search engine for all the languages in the world.

Our current solution is to use mutation callbacks to synchronize texts to a dedicated text search engine. This is language specific: for example, for python: https://github.com/cozodb/pycozo#mutation-callbacks , and for Rust: https://docs.rs/cozo/latest/cozo/struct.Db.html#method.regis...


Sonic [1] might be a good fit, though it is not yet factored into a separate library [2].

[1]: https://github.com/valeriansaliou/sonic

[2]: https://github.com/valeriansaliou/sonic/issues/150


Thank you, that makes sense. Plus with vector search there seems to be ways of shoehorning FTS with it. Could also potentially use sqlite storage and piggyback off SQlite FTS5 but not sure how well that setup could work


What about branching?


Wow, cellular sheaves, that's a connection I haven't thought of before!


Thank you!


Thanks for the suggestion--will surely do that!


would love to have CozoDB be a part of llamaindex too! have a bunch of integrations with existing vector db's https://github.com/jerryjliu/llama_index/tree/main/gpt_index...


For the parquet question: currently CozoDB is developed by a single developer (me), but I am starting to explore ways of expanding the development team. Certainly a lot more features will be added if that happens, and parquet support looks like a really useful one.


Any contact info for you to discuss contributing?


Yes this is correct, only the query's answer set need to be in memory. We are also working on streaming for the Rust API, in which case you don't even need to keep the whole set in memory for simple queries.

FYI here is a not very rigourous performance and memory usage analysis (for a previous version without the vector search capability): https://docs.cozodb.org/en/latest/releases/v0.3.html


Thanks for the speedy response!

Cozo is looking like a top-contender for my project so far :)


Not for the moment, it is not polished enough. Right now it is just a webapp written in React and prosemirror running on top of a CozoDB instance. And it is very rough around the edges (good enough for myself, but maybe not for others).

Once local LLMs that are powerful enough become available, though, I think I will try to find time to polish and publish it, since it can then act as a showcase for what a thinking agent can achieve.


How are you modeling the notes cozodb? I'd interested in parsing my Obsidian nodes into cozoDB as a way to setup alternate views on them. Curious how you thought about storing the bullets and relations between them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: