

Backbone.js and Capsule and Thoonk, oh my A scalable realtime architecture - evilpacket
http://andyet.net/blog/2011/nov/16/backbonejs-and-capsule-and-thoonk-oh-my-a-scalable/

======
rbranson
How is this scalable? It seems like this is a single-instance solution that in
theory could scale, but this isn't even proven, nor is a clear path given. The
author throws out a few vague ideas, and seems to vastly underestimate the
complexity of scaling out. Redis cluster also is nowhere near stable, and
keeps getting pushed back, so how does this help anyone now?

~~~
fritzy
This is not a single instance solution. We can run as many instances as we
want across multiple servers. It is currently a single-Redis instance
solution, however we have some intelligent sharding planned.

I agree with you about Redis clustering, and would extend it by saying I don't
think it'd be helpful for this problem regardless. Intelligent sharding with
some gossip for slave promotion is where we'll have to go.

See my follow-up post tomorrow.

~~~
rbranson
Yes, single Redis, but multiple instances of the application backed with a
single datastore is hardly revolutionary. This is the classic reason why
people go with a stateless app layer, so that it's simple to scale at least a
single tier of the architecture.

I'm still not sure how this lives up to the "scalable realtime architecture"
that dominates the title. People have been building Redis-backed message
brokers for a while now. It seems like you have some untested ideas for
"intelligent sharding" that could possibly provide scalability, but that's
punting on the hard problem. Without this, this is all really pretty
pedestrian.

The gossiping sounds like an interesting lead, is this going to be discussed
in the post tomorrow?

~~~
fritzy
Evolutionary, and probably not even a unique idea. My post does go into the
intelligent sharding a bit. Essentially sharding, in the case of &bang, on
teams with a simple lookup to find which server a given team is on. Allowing
the stateless app layer (as you rightfully call it, other than keeping track
of subscription event routing) to connect to multiple Redis servers as needed.
You could be on two teams on separate servers and it would work fine.

I also agree that people have been using Redis for messaging, however, what
we're doing is a step past that. Again, probably others are doing this. We
have atomic verbs for dealing with higher level objects in Redis that use
Redis-PubSub for communicating the changes to the objects. This way, whenever
an object (feed) is changed (publish/edit/delete/reposition), it can bubble up
to the user or processes that care. These feeds are broken down by not only
topic but subscribe-able units of interest as well. In this way, processes,
users, etc only get updates on data that is relevant to them.

I'm working on converting these verbs to Redis-Lua scripts as my tests have
shown it decreases CPU time and reduces atomicity code (especially in Node.js
where watch->multi->exec callback stacks can be interrupted by other events).
I also expect it to make supporting Thoonk in multiple languages easier as the
core code will be shared.

The gossiping is currently being discussed in the Redis mailing list and may
end up taking a similar approach to Hadoop's intelligent clients. Antirez
would like to provide some tools to make this easier as others have rolled
their own Redis gossiping. I'll be researching that more after implementing
the team sharding. HA, while related to scaling, is a bit off topic, and I
admit to hand-waving here. For now we're focusing more on not
losing/corrupting user data.

Perhaps Henrik's title is hyperbole. I doubt the approaches that we're taking
are revolutionary, and I imagine similar things been done before. However, we
believe that it is a good approach, and we believe in the direction we're
taking enough to share it.

Thanks for the intelligent discussion.

~~~
rbranson
Cool, thanks for taking the time to go a bit more in depth on this. This is a
topic I'm particularly interested in, so I'm looking forward to the follow-up.

------
jeromeparadis
Looks quite promising. I've played a lot with node.js, Redis and pub/sub
wrappers around socket.io and I came to the conclusion that it's this kind of
architecture that does the job of scaling well horizontally until Redis
becomes the bottleneck.

I'll dig in the source code as I'm very interested in how it's implemented.

~~~
fritzy
My post for tomorrow as a quick section on scaling beyond a single instance of
Redis.

Essentially intelligent sharding (in &bang's case, by team). Each node.js (or
other processes) can look-up which Redis server-set owns a team, and can
probably just stay connected to most all of the Redis instances. For HA, slave
each shard, AOB, and off-server backups every 15m. Along with a gossip
protocol for giving up on masters and promoting slaves.

According to our tests, we don't have to worry about it for awhile, but we've
got a plan regardless and will start implementing it soon.

------
fritzy
I'm working on a post for tomorrow that goes into the details of making a
feed-driven single-page app and a bit of the philosophy of our design choices
and Thoonk itself.

------
collint
I don't see anything to deal with conflicts and operations crossing each other
on the wire. The sort of thing that Operational Transformation deals with.

~~~
ajessup
Depends what you're trying to keep concurrent, and your concurrency policy. OT
is great for optimistic concurrency on complex structured documents, but
probably overkill for simple models (like to-do items) with simple attributes
(like title, is-done etc.). For that you could get away with - eg. a revision
counter on the model.

Because of the performance implications of different concurrency, you probably
don't want it baked too deeply into any framework, since there's some stuff
you definitely want strict conflict/error resolution on, and other things you
don't.

~~~
fritzy
Essentially, that's our take as well. However, we do have a plan for dealing
with conflicts and missed updates soon -- probably after Thoonk.js 1.0 slated
for late December. Here's how I replied to this question on the blog comments:

The data in the thoonk feeds are never edited locally on the client. Any user-
actions that change data go out as a websocket rpc call, and get placed in a
job queue. The workers validate the data, check ACL, and then update the
corresponding feeds, which then bubbles back up to the user. This happens
nearly instantly, and the feed updates are atomic. If two users edit an
object, then the last edit wins. For the data we have, this isn't a problem.
However, we've got a plan for dealing with concurrency and conflict resolution
in the future so that we can can handle being offline for periods of time, and
detecting and dealing with conflicts. \--

For &bang in particular, we'll probably just let the user that owns the task
resolve the conflicts. "You queued an update to this task while offline, but
Bob edited as well. Which version would you like to keep?" In general, you
can't edit eachother's data, but you can add to it, so it isn't much of a
problem.

Thoonk 1.0 will have update history and incrementing revision numbers for
feeds that will give you enough information to resolve your conflicts
(whatever the method you choose is) and will help your app recognize and
retrieve missed updates.

