OP here, are you asking about hardware or cloud instance type ?
I would recommend using GPUs at least to initialise the DB. You need a lot of compute once at the start to vectorize all the data, and then regularly (but less intensely) to sync the updated changes.
You can check out the Weaviate docs here if you want more info : https://weaviate.io/developers/weaviate
Yo! I can't seem to replicate this, if you don't mind, could you send me a screenshot and a few more details, like the browser you're using on windows/mac and maybe the console? You can dm me on https://x.com/yoeven or email me at yoeven@jigsawstack.com :)
Agreed. It seems like this has nothing to do with agents. In particular because this is well trodden territory that frameworks like Airflow have been tackling since _well_ before the recent deep learning craze.
I would say that unless agent based examples are highly compelling, it makes more sense to simply remove the agent stuff completely from the pitch, lest you be inevitably accused of taking advantage of the current AI hype for an otherwise unrelated piece of technology (that happens to be a good fit for agent based workflows - something which I haven't observed to even work very well with the best models).
For me Agent is essentially a decision making machine. I had many iterations on the software around building agents and Flow is the culmination of all of the learnings.
For some reason some LLM specific examples just slipped of my mind because I really wanted to show the barebone nature of this engine and how powerful it is despite its simplicity.
But you also right, it's general enough that you can build any task based system or rebuild complex system with task architecture with Flow.
Signal is clear, add more agent specific examples.
It supports tasks, dynamic routes and parallel execution in pure Python build-ins(zero deps). But just a side project so no persistent stuff and just a easy tool.
Totally makes, don't know why I missed agent specific examples... will add them asap.
I focused on the Agent because it was initially built for that internally. But you are right, for the release I made it extremely general and to be used as a foundation for the complex systems.
Since the homepage didn't explain what it did(and simply had upload), I visited the faqs. But the faqs are for a different project. You may want to fix that.
The website and example, and usage looks clean. Kudos! I have some questions around what's happening under the hood that werent evident from an initial read of both your website as well as GitHub.
1. Does subscribe listen for new changes on a transient server(just a queue). Or from a more persistent store?
2. Where do the events persist? I didn't see a connector to postgres. I did see one for s3.
3. What is the default persistence layer you are advocating?
4. Let's say you run 3 instances of the self hosted server. And a random one of them gets a teacher. And 2 random other students gets load balanced to two other servers. How does the teacher get all messages? What's the thread in a distributed setting
5. How do you filter only messages. Eg: only since time T.
6. Pagination / limits to avoid any avalanche?
7. Auth? Custom auth/jwt?
8. REST API to produce?
9. Are consumers restricted to browsers? What about one in node?
10. BONUS: Have you tested if this works embedded as an iframe or embedded in an native/react native mobile app?
I answered 1-5 in my other reply, hit save and so here's more:
6. There is a limit parameter on the query API, and the underlying data structures utilize async iterator patterns so we have a go-forward path to a streaming (data / query larger than memory) implementation. But for now the decrypt implementation is eager instead of lazy, so that's the first place we'd want to focus to make data > memory workloads no problem.
7. Other embedded databases don't have auth, but we are network-aware so it's a different ballgame. Our next step is read/write access control on a per-ledger basis.
UCAN capability delegation allows us to keep an embedded mindset here, in that authorization becomes a matter of data validity, not something that has to be fetched from a centralized resources. How it works: client device agents generate non-extractable keypairs (like Passkeys) and can link them to account principals via any signing endpoint Fireproof trusts (for starters just the one we run, to an end user it looks like clicking a validation link in an email.) Agents create a new cloud database clock register by locally generating an ephemeral keypair that signs itself over to the principal. Our centralized clock register endpoint only allows updates to the resource identified by the clock's public key ID, from agents which have a valid signed delegation chain to the ephemeral key.
To a developer it will look something like `db.share("bob@example.com")` and now Fireproof Cloud will let Bob read and/or write the db also.
What's cool about this is that access control changes are just data manipulations, so they can happen offline. And the valid delegations can be safely delivered over any channel. In fact there are no secrets in this system except for the non-extractable keypairs.
If you are thinking to yourself "what about revocation?" -- we are hiring.
8. The sync endpoint has the minimal blob k/v (no list) and register APIs. And can all be floated on top of any raw kv with check-and-set semantics if needed.
We have plans for a REST API in Fireproof Cloud, where if you allow the cloud to decrypt and process your data, we can give you raw queries instead of you replicating and then querying locally. I am thinking a CSV output here would be a good place to start.
9. Runs great anywhere JS runs. We have examples (like CatBot linked above) that subscribe to the ledger on the backend and operate locally, often responding the user events. So the DB is acting as an RPC bus... this is a common pattern in CouchDB so I made sure Fireproof works great like that.
To run in an edge function, you usually aren't gonna replicate to local filesystem, instead you can configure the database to read and write directly with the cloud store. Because of the eager decrypt we do, this is actually pretty fast and not that chatty.
10. The CodePen demo on our homepage is an iframe, works great. We have a contributor (I think I see in the thread here) who is working on React Native -- most of the heavy lift is done, but our gateway interface is only now settling down to where it makes sense to finalize the integration. I have also done Socket Supply for mobile and that works great.
These are awesome questions, I'll try to fold the answers into the docs also.
1. The embedded database subscribes to the remote sync endpoint when it is connected. This subscription might be polling, websocket, or anything else. The local embedded database will try to keep up with changes anyone pushes to the remote endpoint. This is more a backend mechanical thing than an API you'll see.
Your code can subscribe to the local database -- this is a JavaScript event loop, and any updates, local or remote, will cause your callback to run. The upshot is all you have to do is connect your database to the sync endpoint and it will stay up to date, and you can also connect your UI to the database via `db.subscribe()`
2. Updates are written to local storage (indexed db or the filesystem) as encrypted blobs. These are then replicated to the cloud (without being parsed by the cloud). We have SQL connectors also, but we haven't done the Postgres specific stuff (just started designing it). That is the data side. There is also the clock register, which the client updates to point to the most recent blob. This register is multi-writer safe, and can occasionally point to more than one "head" blob, in which case the client does the deterministic merge on read.
3. In my experience most people use the defaults, so we have Fireproof Cloud which uses R2 and durable objects. We also have a SAM template for AWS, and a connector for Netlify, in addition things that are more like parts for building your own backend (file and http endpoints).
4. Each ledger replicates 100% when it syncs, so all hosts have the same data (no sharding within a ledger.) Typically you have one centralized endpoint to sync via. (p2p is possible but you'd end up contributing some plumbing to the project I bet). So in this case the class would have a URL that is the sync point, and everyone would pull from it periodically or via streaming.
Merges are idempotent, deterministic, associative, and commutative, so it doesn't matter what order the teacher and students apply updates to their local instance, once all updates are applied, they have the same state.
5. The e2e encryption means you'd have to give the keys to the server to allow it to create subsets for sync, so we haven't done that yet. Our next optimization is to sync the readonly current dataset first, then any extra data needed for writing, and only when necessary, the historical log. This still doesn't solve the subset sync issue, but will benefit all use cases immediately.
wrt (3) Being able to self-host is extremely important. I noticed a lot of focus on the docs on the Quickstart/client usage. But things like default storage engine as a ENV, path for storage as an ENV. These are very important.
hmm. Replicate state to all clients. Ok.
Seems like an opinionated but well thought through project. Godspeed!
Thanks, and thanks for the encouragement to fully document the gateway interface. We have been flux-ing it lately but as soon as it settles down we’ll do that.
The vision is many small ledgers, so the full replication per ledger makes sense, but we have work to do on cross-ledger queries
Some questions remain answered from your home page ( from someone at a mid sized firm writing their "build-it-yourself" version of the same goal at largish scale)
1. How often is each campaign run?
2. How does it avoid re-sending the same campaign to the same segment over and over? You really should change your queries with a where close of the last N mins to avoid sending all users all the time. where N is the answer to (1)
3. Is there a way to send reminders, with exponential backoff?
4. My first instinct was searching for postgres, mysql. both were not present. Consider saying supported. and then mention your orm, maybe which all db's it supports.
5. It seems like the channel right now is only email. But it wasn't clear if the channels could be push/sms/etc in the future. Because its likely companies already are accustomed to multi/omni channel. A roadmap would help.
6. Your docs mention the email channel as postmark. but i couldnt find a list of supported email apis. Maybe a way to add your own vendor would be great for contributors.
7. Tools and alternatives to temporal/inngest/hatchet/trigger/zeplo/mergent/restate are welcome, and this seems focussed on messaging/comms.
1. Currently campaigns are oneshot: users enter once and progress through until the generator flow returns. No auto re-entry yet, but planned for 'dynamic' campaigns. They are triggered by events that you can send to the api, or by a user automatically entering a segment.
2. See 1
3. Yes, the campaign flow definition is "just typescript". It's turing complete and runs in a QuickJS VM, and therefore supports exponential backoff.