You write the update directly to the cache closest to the user and into the eventually consistent queue.
We did this at reddit. When you make a comment the HTML is rendered and put straight into the cache, and the raw text is put into the queue to go into the database. Same with votes. I suspect they do this client side now, which is now the closest cache to the user, but back then it was the server cache.
In Nathan Marz's (the article author) book, Big Data, he describes this and calls it the Speed Layer. I haven't fully finished the article yet, but the components it's describing seem to be equivalent to what he calls the Batch Layer and the Serving Layer in his book.
But I'm kind of getting the impression this works without any speed layer and is expected to be fast enough as-is.
Rama codifies and integrates the concepts I described in my book, with the high level model being: indexes = function(data) and query = function(indexes). These correspond to "depots" (data) , "ETLs" (functions), "PStates" (indexes), and "queries" (functions).
Rama is not batch-based. That is, PStates are not materialized by recomputing from scratch. They're incrementally updated either with stream or microbatch processing. But PStates can be recomputed from the source data on depots if needed.
Forgive me if I’m misunderstanding things, but this seems quite similar to what Materialize and ReadySet do, but like “as a library”, because Rama doesn’t use a “separate” layer for the storage stuff. Is that correct-ish?
Or, maybe their pitch is that the streaming bits are so fast, you can just await the downstream commit of some write to a depot and it'll be as fast as a normal SQL UPDATE.
It’s fast until it’s not. Making a post and then hitting reload and not seeing it can be very jarring for the user. Definitely something to think about.
What do you mean? Every post I do shows up instantly.
Reloading the page from scratch can be slow due to Soapbox doing a lot of stuff asynchronously from scratch (Soapbox is the open-source Mastodon interface that we're using to serve the frontend). https://soapbox.pub/
Yes. Depot appends by default don't return success until colocated streaming topologies have completed processing the data. So this is one way to coordinate the frontend with changes on the backend.
Within an ETL, when the computations you do on PStates are colocated with them, you always read your own writes.
That's part of designing Rama applications. Acking is only coordinated with colocated stream topologies – stream topologies consuming that depot from another module don't add any latency.
Internally Rama does a lot of dynamic auto-batching for both depot appends and stream ETLs to amortize the cost of things like replication. So additional colocated stream topologies don't necessarily add much cost (though that depends on how complex the topology is, of course).
I have to say in my ~12 years as an active Redditor I can’t recall a time where I saw any real state issues, even with rapidly changing votes, etc. Bravo!? Now that we’re beyond the days of molten servers, I have to say its overall reliability in the face of massive spiky traffic is quite a feat.
You write the update directly to the cache closest to the user and into the eventually consistent queue.
We did this at reddit. When you make a comment the HTML is rendered and put straight into the cache, and the raw text is put into the queue to go into the database. Same with votes. I suspect they do this client side now, which is now the closest cache to the user, but back then it was the server cache.