More

zh217 · on April 21, 2023

Sorry about that ... I will revise it to be more consistent. Cozo is a bit ambiguous, so now it is usually called CozoDB.

zh217 · on April 21, 2023

The linked article explains these in details.

zh217 · on April 20, 2023

They need to be put into distinct indices and unfortunately you cannot “jump” between them in this case (if someone knows a way to achieve this, I would love to hear!)

joshspankit · on April 20, 2023

I think I’m understanding that the item’s vector in one LLM can be stored as one index and the vector in another LLM can be stored as a second index without them colliding or one having to overwrite the other.

Is that right?

zh217 · on April 20, 2023

Yes, actually I already do that. Sbert is better than openai ada embeddings for many use cases.

joshspankit · on April 20, 2023

Amazing. Then as far as I’m concerned that functionally solves the problem until the industry figures out cross-embedding jumps

zh217 · on April 20, 2023

Great questions!

- As can be seen https://docs.cozodb.org/en/latest/releases/v0.3.html, for concurrent writes about 200K QPS can be achieved with 24 threads on a pretty old server. I think it is enough for a small to medium social network.

- You can start independent instances and use them together in your user code. You can have as many as you like, but data can only be exchanged through your code: they can't talk directly to each other.

- If by git-like you mean point-in-time queries, yes that's what the feature is for. But git comes with lots of other things such as merge logic, etc. These need to be implemented outside CozoDB.

- We do use CozoDB for data storage in production systems ourselves, and we back up a lot. So far nothing disastrous has happened. Note that CozoDB does not have any meaningful concept of user/authentication/authorization (yet), so you must make sure that only trusted clients can reach it (only an issue if you use the standalone server, since the embedded DBs do not open any ports).

joshspankit · on April 20, 2023

As a graph “fanboy” I’m impressed, humbled, and inspired by the work that’s been done already and the direction you’re heading!

> Note that CozoDB does not have any meaningful concept of user/authentication/authorization (yet)

Please please please implement the Palantir security model unless you already have a smarter idea coming down the pipe. Palantir regularly scrubs past media from the internet, but there is a blog post that has the ACL slides from the now-private video: https://onetwo.ren/级联GraphQL访问控制/

webstrand · on April 20, 2023

Did some digging, I found this: https://documents.pub/document/palantir-access-control.html which appears to be the full slideshow

joshspankit · on April 20, 2023

Perfect yes, thank you.

gremlinsinc · on April 21, 2023

Could this be used for creating a memory system, with weights, and the ability to rewind thought chains? Would you be interested in partnering up ? I'm not a database dev, but I have some great ideas, and I'm already reaching out to investors to build something in AI, and have a partner potentially. I'd love to build a database that is basically like the midbrain of AI, a database hybrid that is built specifically for AI memories, and memory relations. If you're open to collaborating and building a product, perhaps my ideas could be a good 'test case' and be mutually beneficial to all of us. email : patrickwcurl - gmail.

canadiantim · on April 20, 2023

Amazing! Thank you, all very encouraging answers. Congrats on everything you've achieved with Cozo so far!!

One last question if possible. Is there a recommended way to do Full Text Search on data stored in Cozo?

zh217 · on April 20, 2023

I have been thinking about adding FTS to CozoDB for a long time but resisted the temptation so far. The reason is that text search is language-specific: what works for one language does not work for another. There is simply no way that CozoDB can duplicate the work of a dedicated text search engine for all the languages in the world.

Our current solution is to use mutation callbacks to synchronize texts to a dedicated text search engine. This is language specific: for example, for python: https://github.com/cozodb/pycozo#mutation-callbacks , and for Rust: https://docs.rs/cozo/latest/cozo/struct.Db.html#method.regis...

infogulch · on April 21, 2023

Sonic [1] might be a good fit, though it is not yet factored into a separate library [2].

[1]: https://github.com/valeriansaliou/sonic

[2]: https://github.com/valeriansaliou/sonic/issues/150

canadiantim · on April 21, 2023

Thank you, that makes sense. Plus with vector search there seems to be ways of shoehorning FTS with it. Could also potentially use sqlite storage and piggyback off SQlite FTS5 but not sure how well that setup could work

infogulch · on April 22, 2023

What about branching?

zh217 · on April 20, 2023

Wow, cellular sheaves, that's a connection I haven't thought of before!

zh217 · on April 20, 2023

Thank you!

zh217 · on April 20, 2023

Thanks for the suggestion--will surely do that!

freezed8 · on April 20, 2023

would love to have CozoDB be a part of llamaindex too! have a bunch of integrations with existing vector db's https://github.com/jerryjliu/llama_index/tree/main/gpt_index...

zh217 · on April 20, 2023

For the parquet question: currently CozoDB is developed by a single developer (me), but I am starting to explore ways of expanding the development team. Certainly a lot more features will be added if that happens, and parquet support looks like a really useful one.

anonzzzies · on April 20, 2023

Any contact info for you to discuss contributing?

zh217 · on April 20, 2023

Yes this is correct, only the query's answer set need to be in memory. We are also working on streaming for the Rust API, in which case you don't even need to keep the whole set in memory for simple queries.

FYI here is a not very rigourous performance and memory usage analysis (for a previous version without the vector search capability): https://docs.cozodb.org/en/latest/releases/v0.3.html

luizfelberti · on April 20, 2023

Thanks for the speedy response!

Cozo is looking like a top-contender for my project so far :)

zh217 · on April 20, 2023

Not for the moment, it is not polished enough. Right now it is just a webapp written in React and prosemirror running on top of a CozoDB instance. And it is very rough around the edges (good enough for myself, but maybe not for others).

Once local LLMs that are powerful enough become available, though, I think I will try to find time to polish and publish it, since it can then act as a showcase for what a thinking agent can achieve.

lupickup · on April 23, 2023

How are you modeling the notes cozodb? I'd interested in parsing my Obsidian nodes into cozoDB as a way to setup alternate views on them. Curious how you thought about storing the bullets and relations between them.