Hacker News new | comments | show | ask | jobs | submit login

Well, the means I know of is affinity: You serve the user some kind of session identifier, like a cookie, or special URLs that encode the session ID. Then, when the user PUTs or POSTs something, you instruct the various layers of your stack to read that session identifier and route based on it, directing that user's requests to a specific machine for a while – the one that is most likely to be up-to-date on what that user has recently done – instead of randomly assigning each request to a different machine.

Then you put a timeout on the identifier so that the user eventually goes back to using randomly-assigned machines. And perhaps you try to ensure that the system doesn't break too badly if the user submits a write to a machine that promptly goes offline for a while. And you pray that there are no second-order effects. And, like everything else you do, this makes your caching strategy more complicated: Can you afford to serve a given request from cache without thinking, or do you need to waste time processing a session cookie, and then perhaps waste more time and RAM by having your back-end generate a page that could just as well have been served from cache after all?

That explains how you serve the same data to that user, but the more interesting problem is: how do you propagate the changes to the rest of the users over time?

Do you run some sort of backlog daemon that moves stuff between the different shards? And how is that faster than doing so immediately (the same data would be moved either way...)?

Scalability isn't about it being faster, just "able to be scaled". Sometimes it's about being able to make it faster by adding hardware and configuration.

The generic answer must of course be: It depends on your architecture.

My answer assumed that we were talking about replication lag, where you are copying data among the back-end data stores "immediately", which is to say "quickly, hopefully on the order of milliseconds but you can't count on that". Unfortunately, depending on the design of your database, replication is often not "immediate" enough to guarantee consistent results to one user even under the best conditions, let alone when the replication slows down or breaks, and that's why you might build something like the affinity system I just outlined.

The actual mechanism of replication depends on the data store. MySQL replication involves having a background daemon stream the binary log of database writes to another daemon on a slave server which replays the changes in order. (Of course, multiple clients can write to MySQL at a time, and ordering all those writes into a consistent serial stream is MySQL's job.) Other replication schemes involve having the client contact multiple servers and write directly to each one, aborting with an error if a critical number of those writes are not acknowledged as complete - and then there presumably needs to be some kind of janitorial process that cleans up any inconsistencies afterward. (I'm no CS genius; consult your local NoSQL guru for more information. Needless to say, you should strive to buy this stuff wrapped up in a box rather than build it yourself.)

As for "shards", that's different. My understanding of the notion of a "shard", as distinct from a "mirror" or a "slave", is that it's a mechanism for partitioning writes. If you've built a system with N servers but every write has to happen on each one, the throughput will be no higher than on a system with one server and one write, and you're doing "sharding" wrong. What you wanted was mirroring, and hopefully you weren't expecting that to help scale your write traffic.

If data does need to be spread around from shard to shard, hopefully it has to happen eventually rather than quickly, and some temporary inconsistencies from shard to shard are not a big problem. So you can use something like a queueing system: If you have some piece of data that needs to be mailed to N databases, you put it on the queue, and later a worker comes along and takes it off the queue and tries to write it to all N databases, and when it fails with one or two of the databases it puts the task back on the queue to try again later. (Or, perhaps, you put N messages on the queue, one per database, and run one worker for each database.) You'll also need a procedure or tool to deal with stale tasks, and overflowing queues, and the occasional permanent removal of databases from the system.

This soliloquy is looking long enough, now, so it seems like a good time to reiterate patio11's point: Don't build more scaling than you need right now. Hire someone to convince you not to design and build these things.

Thanks, this makes sense. I've also heard host affinity referred to as stickiness before.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact