It's still in very early stages (although I am using it in production for my company)
It's very similar to Debezium (mentioned in another comment), but it's built with Phoenix (elixir), so great for listening via websockets.
Basically the Phoenix server listens to PostgreSQL's replication functionality and converts the byte stream into JSON which it then broadcasts over websockets. This is great since you can scale the Phoenix servers without any additional load on your DB. Also it doesn't require wal2json (a Postgres extension). The beauty of listening to the replication functionality is that you can make changes to your database from anywhere - your api, directly in the DB, via a console etc - and you will still receive the changes via Phoenix.
I still have to document a lot of how it works and how to use it, but if anyone is interested then I will make it a priority over the weekend
There is a big difference between the changestream of many SQL and noSQL databases, and what RxDB does. Having a stream of changes is useful but not the whole solution. RxDB is capable of using single document changes of a stream and recalculate the new results of an existing query. This saves you not only much IO performance but makes developing much easier. See https://rxdb.info/query-change-detection.html
Yeah perhaps "same functionality" is an ambitious statement. I just saw debezium mentioned below and thought i'd throw this out in case someone reading this and it fits their needs!
I would love to use this just for the offline and query change detection capabilities alone - but the latter is currently beta and disabled by default. Is there a reason for that?
RethinkDB might be another database to look at if this is functionality which you are interested in. My only gripe with doing this sort of thing is that if you offer a public API, it becomes separate to what you are using for the application internally. There are also some GraphQL databases like Prisma that offer this sort of functionality.
Make this good enough and you could very well cut into a huge chunk of Firebase's business model.
I'd seriously love to have query subscriptions, especially if the library is robust enough for use on mobile.
Ine large differentiator of Firebase's DB offerings is the subscription functionality, mobile and web, but using that means buying into NoSQL. A proper subscribable SQL database would be amazing.
Of course the conflict resolution on writes, and the local caching is another key benefit Firebase offers. Not sure how that could be done with a real relational DB!
Mainly because I want to keep my team small so it’s easier if we all commit to Postgres. But We also hard a hard time with Firebase’s filtering on sub collections (for their new Firestore)
Interesting, thanks for sharing how you're doing it.
Is there any mileage in doing this with triggers? I have a _very_ legacy system which needs caching adding. Rather than dig through the code to invalidate the cache every time a record is updated/deleted in 20+ tables, I am thinking that being able to listen to the SQL executed and invalidate the cache based on the tables involved would be a clean approach.
But not found any way to make that possible - yet.
There's no 8k limit with triggers? Could you expect on what you mean by that?
One of the motivating factors for not using triggers is that the implied overhead is very significant. By logging changes separately which triggers the write volume is roughly doubled, and the overhead of insertions is much much higher (a lot of fast/bulk path can't be used, the trigger processing needs to be performed).
You can use apache ignite as a caching layer over your db. (Or as a db itself). You can then register a ContinuousQuery to listen for changes. A grpc service can be stood up to stream these changes to other clients, even front end ones (via grpc-web).
Did you evaluate gRPC or gRPC-Web when considering your websockets implementation?
I'm using gRPC in other languages/frameworks and I'd really like to migrate one of those to Elixir in the near future, but I'm curious to know if anyone else has given it a try.
Having client state just be a replica of server state solves so many problems I don't understand why the concept never caught on. Pouchdb/couchdb are still the only ones doing it afaik.
Instead we have a bajillion layers of CRUD all in slightly different protocols just to do the same read or write to the database.
When your data is public and immutable, this approach is very pleasant. The client becomes just another caching layer and worst case it's presenting a historical version of the truth. You can even extend this across tabs with things like local storage.
This breaks down quickly once you have data that could become private or mutate rather than append.
Couchbase (with its sync gateway) uses "channels" to sync data. It even lets you change the channel of a doc, and to the sync client it shows as if the doc was deleted (if the user is not part of the new channel)
Yeah, couchdb more or less requires you to replicate data for individual users if you need complex permissions and want the user to access the couchdb directly.
Permissions in general need to be handled by custom reconciliation functions (dropping unauthorized changes) or some kind of nanny system that can react to changes.
For example, imagine blog posts as documents, and a list of comments inside that document. Instead of the user adding/changing the comment list, the user would add a record to a comment request list, and either the reconciliation process or a nanny service checks the requests and updates the comment list.
The much simpler solution of course is to not let the users have any write access to the couchdb and just use a REST API. But then you loose much of the benefits of couchdb...
I think the Firebase Realtime Database and Firestore have a good model for offline and being able to have private and mutable data. It does get complex but the Firebase SDKs do the heavy lifting for you here.
Can you share more about the private and mutable data functionality for Firebase? I used them a few years back, and never really understood how to do private data without building my own ACL inside Firebase.
Thanks for the response. I suppose my biggest question is not how I can store private data (I can just make it inaccessible via the right rules). But, it seems like I am then layering my own ACL system onto those rules. And, I never got a sense there was an easy way to write a test that simulated my rules against my data and made sure I was not accidentally creating a leaky rule.
In so many ways it is SO much easier to use Firebase because all the pieces are right there as compared to a DB + Server + Front End + Tooling. But, I still always worried that I would somehow leave a gaping hole in my data and not know about it.
And, I was never really sure how I can easily do joins across data without writing my own bespoke metalanguage inside Firebase. A link posted today on HN talked about XML does turn out to be good for nested data (hence the reason it is used for UIs), and it feels like Firebase being more or less JSON loses in this respect.
That's just my experiences, and I say that loving Firebase.
Those two examples made me think: Firebase removes a lot of complexity for me, but it forces me to write my own layer of complex access and DB logic which I never felt fully qualified to do, and as such, just went back to using databases with an ORM and a backend server.
In reality data can't become private again, after being available. You may try to contact all users to delete their copy, but they may not respect that.
That's in theory. In actual reality, if you're making an app that has private data and is not crawled by bots, most of the time users don't save everything that they see.
I think the parent means clients (user agents), not actual human users. As soon as any state that needs to be hidden is exposed to clients, that's a security breach regardless of whether any human eyeballs have seen it.
This is true, too, but I meant users, too. It's all too common for users to screenshot things, etc. I see it a lot of the time on Twitter for example. You can delete tweets, but oftentimes it's pointless. But I also regularly see my gf taking photos with her phone of various apps she uses on her notebook, just to have the info around on her phone. I suspect it's pretty common, because it's much more low-tech then saving pages or API scraping.
In short, server data is more normalized than the data client needs. As you get closer to view layer, your data gets denormalized further and further. Client-server interaction sits somewhere in the middle to both minimize the bytes-over-wire as well as the round-trips to the backend to get up-to-date.
Take a look at GraphQL, its central promise is to let client choose what's the optimal data it needs (that often denormalized through nested GraphQL queries), and send it in one batch.
It is not to say there shouldn't be a simple replica. It is just if we want it to be a simple replica, we should have a server-side mirrored some-what-denormalized representation rather than just the raw server-data models.
Data security is a huge issue, Facebook.com has very specific whitelisted access patterns encoded as CRUD endpoints. 1) Users want to make sure their data is used in non-creepy or non-stalky ways; 2) Facebook's business needs to control the access point so they can serve you adds or otherwise monetize. So the API exposes only limited access patterns, the API TOS disallows caching, and they go to great lengths to prevent scrapers.
If immutable fact/datom streams with idealized cache infrastructure becomes a thing (and architecturally i hope it does) it's going to need DRM to be accepted by both users and businesses.
I'm assuming that the client state would be a lazy representation of the back end state, that only pulls data as needed. The result being that local and server state must both be treated only as asynchronously accessible.
There is a limit on what data can be replicated with the client.
For a simple chat-app you can replicate all messages of a user. But you would never replicate the whole state of wikipedia to make it searchable.
Wrong. This is popular anti-Meteor FUD spread by people who don't know how to use its features properly or have the engineering/computer science background to design a system to be able to manage computational complexity or scalability.
In 2015, my business implemented a Meteor-based real-time vehicle tracking app utilising Blaze, Iron Router, DDP, Pub/Sub
Our Meteor app runs 24hrs/day and handles hundreds of drivers tracking in every few seconds whilst publishing real-time updates and reports to many connected clients. Yes, this means Pub/Sub and DDP.
This is easily being handled by a single Node.js process on a commodity Linux server consuming a fraction of a single core’s available CPU power during peak periods, using only several hundred megabytes of RAM.
How was this achieved?
We chose to use Meteor with MySQL instead of MongoDB. When using the Meteor MySQL package, reactivity is triggered by the MySQL binary log instead of the MongoDB oplog. The MySQL package provides finer-grained control over reactivity by allowing you to provide your own custom trigger functions.
Accordingly, we put a lot of thought into our MySQL schema design and coded our custom trigger functions to be selective as possible to prevent SQL queries from being needlessly executed and wasting CPU, IO and network bandwidth by publishing redundant updates to the client.
In terms of scalability in general, are we limited to a single Node.js process? Absolutely not - we use Nginx to terminate the connection from the client and spread the load across multiple Node.js processes. Similarly, MySQL master-slave replication allows us to spread the load across a cluster of servers.
For those using MongoDB, a Meteor package named RedisOplog provides improved scalability with the assistance of Redis's pub/sub functionality.
That's exactly the model that Apollo Client library uses (GraphQL-based data store for react), and teams I spoke that tried it are quite enthusiastic for this reason.
Same way you scale replication for any server, by sharding and only replicating the shards you care about.
The "shard" could just be that users own feed in this case. Then you get offline for free where user adds a tweet and it appears immediately, replicating back to server when he goes back online. The server replica side will need to be a lot more complicated to deal with broadcasting but I don't see why it won't work.
I attempted something similar on a current project, the problem is with inital data loading. If you are hitting the URL/Page for the first time you are waiting minutes or more for non trivial data sets.
Why not just load what is needed and hydrate the data over time? What about datasets where you need pagination/ordering etc. And the only way to guarantee order is to pull the whole set?
In twitter-like applications, the default ordering is usually just ORDER BY timestamp DESC. You could rely on this default ordering to load the first few dozen items on first visit, and load the remainder asynchronously. Sort of like automatic infinite scrolling.
Of course, users with limited RAM and metered connections won't like that. Which is another reason why it didn't catch on.
I've heard somewhere that Twitter maintains a copy of all tweets in a user's timeline for that user.
If I had to make a Twitter clone with CouchDB, I would probably have one timeline document per user, and maybe one per day to limit the syncing bandwidth.
I tried out Postgraphile Realtime and it's pretty cool.
For anyone who wants something similar that's not GraphQL, then I'm in the early stages of developing a Phoenix (Elixir) implementation which broadcasts changes over websockets: https://github.com/supabase/realtime
You can do this in datomic! I made an end-to-end proof of concept of this using datomic and websockets. The client can open a websocket and subscribe to a query and then the server will react to changes in the database and automatically update the model and therefore the UI (no eventing code required).
I think you have not understood what RxDB is.
It is a client side database. It works also when the client is offline. It does not need a stable internet connection over a websocket.
Your example is similar to RethinkDB and others. A websocket streaming json.
A more common solution includes using Datascript[1] as the client side DB and then using something like Posh[2] or Datsync[3] to handle query/pull subscription changes.
Another DB that plays in that field is Watermelon DB (https://github.com/Nozbe/WatermelonDB) which supports react-native based on SQLite and the web based on the LokiJS in memory DB.
General question about reacting to database events: It seems like when responding to DB events it would be easy to accidentally create an infinite loop. Is that an issue with this pattern, or is it easy to avoid? Do any of these data subscription tools have safeguards in place to prevent this?
This is where command and query (with subscribe) must be neatly separated.
An event A launches an updates of the DB, that launches the query part to react, then stop. There is the risk of having an infinite loop, if the event A is launched by the query part, which is very related to the behavior of your app.
Isn't this the case with any interaction between 2 systems? Between a client and server, you can have logic in the client that reacts to a response from the server that triggers another request to the server.
Very true, maybe I'm overthinking it? In my head: If I update full_name from first_name and last_name whenever a user table row changes, if I do this naively I will update the user on every update and trigger a continuous loop of updates.
I have certainly created infinite loops while using React's componentDidUpdate, maybe it's just important to define triggers on single attributes rather than entire database rows.
The ideal case is that the DB can work out the difference in the subscribed query caused by the DB update so that the front end can make changes incrementally and doesn't have to rerender the entire list/table.
Naively it sounds like the halting problem. You would have to verify that no consumer can make a change to the data set you are listening to. You would want to manage yourself when not to respond to an event.
Isn't "subscribe to all state-changes like the result of a query" something you'd implement efficiently using a forward chaining inference engine, like something based on Rete?
I don't know too much about database systems, but if we were in an inference (e.g., production) system, then that would make a lot of sense! I build logical inference systems and have implemented this kind of functionality in forward, backward, and bi-directional logical inference.
So basically when a change-event comes, RxDB does not run the query against the database again, but instead uses the old results together with the event to calculate the new results.
If you want a change-stream out of your regular MySQL or PostgreSQL database systems, check out Debezium: https://debezium.io/
It basically registers itself as a fake replica server so that it can get updates from the master (like binlog in case of MySQL) and then it forwards those updates to Kafka. The possibilities are endless what you can do with those updates as Kafka consumers.
There is a big difference between the changestream of many SQL and noSQL databases, and what RxDB does.
Having a stream of changes is useful but not the whole solution.
RxDB is capable of using single document changes of a stream and recalculate the new results of an existing query. This saves you not only much IO performance but makes developing much easier. See https://rxdb.info/query-change-detection.html
Maybe this is an uninformed comment, but how is this different from the Firestore database in Firebase where you can listen for when a document changes?
hey, this is great! I've been building my own version of exactly this concept based on an event-sourced in-memory graph for my visual programming environment and it works really well for creating a collaborative editing system. I'll be investigating this project now to see if I can't use it instead of my hand-rolled solution.
This is something Realm does really well, both locally on the device and also with live subscriptions to data in the cloud.
We used this for our mobile apps and the experience was pretty awesome. The ability to live observe both individual objects in the dB and results of queries makes building reactive UI’s a very pleasant experience.
Nice ! reminds me of the Ryzom project in Python/Django/Postgres which implements a meteorjs-like protocol and proposes the same example with a todo list
What I personally like about RxDB is that it also works completely offline. However, I don't really understand encryption: As far as I understood, it encrypts on the client side with the db passwort which is sent to the server.
Another issue is Authorization and Authentication, I could not find a good solution for me for CouchDB. Couchbase seems to have better solutions for this but the premium plan seems really expensive and as far as I understood you need a "server" and a "sync server" which don't have low system requirements, at least for me.
No you normally do not send the password to the server.
You can ask the user to enter the password when the application is openend.
Authentication is much easier when you use the GraphQL replication. There you are much more flexible on which data you return depending on which user is asking for it.
RethinkDB is not offline first. It runs queries on the server-side, not on the client.
There is horizonDB which can replicate with RethinkDB but the project is dead with no commits since the rethinkdb company gave up.
So I was one of the developers of RethinkDB's Horizon, and I'm really glad to see someone take the database change stream to RxJS concept to its logical conclusion.
This project looks really awesome, and very much has the shape we wished Horizon could have turned into
The community has taken it over as far as I know. Looking at the github there appears to be some active development still.
https://github.com/rethinkdb/rethinkdb
I have made something similar trying to match more the concept of a materialized view, but just on the client instead of inside the database: https://github.com/tozd/node-reactive-postgres
I do not think that meteor will scale better then your GraphQL server with whatever database you want to have.
With meteor you are bound to a specific ecosystem which is often a pain. RxDB does only one thing, it is a client side database. Everything else in your stack is free to choose.
RxDB is offline first. You can still query your data even when the user has no internet. Hasura will not make your app workable without a stable connection to the server.
How useful is real-time data really? For example, HN isn't updating in real time and for me that's fine. If real-time updates come at a ridiculous complexity and performance cost, then is it really worth it?
See also Google Wave, which had "real time" as its main novelty, but ultimately failed.
for my project it's very useful for collaborative features. Multiple people need to be able to edit the same graph structures at the same time, which would be difficult to scale out via a notify-and-poll architecture.
I've found that collaborative editing can be very confusing, and prefer a commit/merge step over simultaneous editing. I think all collaborative editing software should at least offer this kind of interface.
I tried several times to understand what GUNdb does and how its different or which features it has.
I have given up. As far as I can tell it is something between Blockchain, graph-database and sync.
In fairness it was a bit pretentious of whoever decided to call it real-time which was a word with a specific meaning and a very hard earned reputation. 'Live' might have been better.
Depends on your definition of realtime.
Realtime with RxDB is not like "Real-time Computing" but like realtime synchronisation how it is described by firebase https://firebase.google.com/docs/database
It's still in very early stages (although I am using it in production for my company)
It's very similar to Debezium (mentioned in another comment), but it's built with Phoenix (elixir), so great for listening via websockets.
Basically the Phoenix server listens to PostgreSQL's replication functionality and converts the byte stream into JSON which it then broadcasts over websockets. This is great since you can scale the Phoenix servers without any additional load on your DB. Also it doesn't require wal2json (a Postgres extension). The beauty of listening to the replication functionality is that you can make changes to your database from anywhere - your api, directly in the DB, via a console etc - and you will still receive the changes via Phoenix.
I still have to document a lot of how it works and how to use it, but if anyone is interested then I will make it a priority over the weekend