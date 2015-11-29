Hacker News new | comments | show | ask | jobs | submit login
Scale Out Multi-Tenant Apps Based on Ruby on Rails (citusdata.com)
One word of warning about choosing a partitioning scheme: if you select something like customer (as recommended in the article's example), you might get to a point down the road where you realize that some customers are much, _much_ bigger than other customers. This can put you in an awkward place as a single tenant is approaching the bounds of their shard because you may not have a lot of choice but to figure out how to further subpartition their data.

That said, per-customer/user partitioning might still be your best bet given the advantages that it conveys around isolation. If I understand correctly, Citus can guarantee atomicity, consistency, and isolation (ACI*) within a transaction localized to any single partition which is a _huge_ benefit for building apps that are more tolerant to failure and problematic edge cases with very little effort on your part (compare this to Mongo, which in its latest versions is just starting to give you the "consistency" part and nothing else).

Anyway, nice work from the Citus team!

We actually have something on the roadmap to help address the case of a very large customer. In future versions of Citus you'll be able to isolate a single very large tenant to their own shard so their resource consumption won't compete as directly with other smaller users, and then if needed you could scale them out to their own physical node with no changes to your app.

You'll need multishard table for big clients. And with ability to split shards to grow more.

Is that being done via custom partitioning functions? i.e. rather than a blind hash of the partition key to the shard space, adding if/else logic to say that key ABC goes to shard XYZ?

How is this different from existing multitenacy gems like `apartment`?

If you don't have to share data between users, a multi-database setup is the way to go. I wrote a post about how I do it a little over a year ago...

https://tomschlick.com/2015/11/29/lessons-from-building-mult...

What is your experience with doing schema upgrades across many databases? Is it usually possible to break up schema changes in small pieces, so that it works across multiple versions of your software?

Yeah we try to minimize the changes that would require a lock. For those cases we have a flag where we can turn off each tenant individually for maintenance.

For long running / many record migration we shove those into a queue to they can be done in the background in parallel.

An important aspect of the infosec philosophy my employer pushes is "blast radius". If you assume you will be compromised then you want to ensure as little data as possible ends up in that compromise.

Coupled with the opportunity for things like mass assignment vulnerabilities (i <3 the strong params pattern on rails) I am a little perturbed by the notion of my data being housed next to other customers. If that customer is a more attractive target, and a compromise is found, now I'm just along for the ride.

That all said, I'm not well appraised of what problem Citus Data is solving -- so maybe I've just read this wrong?

Thanks for sharing - author here.

I've been working on this library for 2-3 months now, if somebody gives it a try let me know how it goes :)

I'd be happy to answer any questions as well.

Does this have easy support for some kind of sub-tenant setup?

Following your example, let's say that most customers have multiple departments and that most tables have both a customer_id and a department_id.

Oh, and this looks solid. Nice work :)

Thats a good question - there is nothing explicit for it, yet.

You could probably do it nonetheless with Rails' own has_many/belongs_to relationships for that sub-tenant, but the automatic adding of department_id into queries wouldn't happen.

Depending on what storage you end up using, that might matter or not. For example with Citus, you'd typically take the top-level tenant (customer_id) as a shard key, and just have all the departments of that tenant on the same node - so it would work fine.

What's the benefit of this approach vs doing scoped queries? i.e. current_user.pages.find(params[:id])

The issue is that you would be missing the tenant_id (or user_id) if you were to do `other_object.pages.find(params[:id])` (i.e. on something thats not directly below the user)

That is a problem in systems that expect the tenant_id to always be in the query, so you can locate which node the query needs to go to.

can't the same be achieved with has_many :through association?

Its a bit more complex than that unfortunately.

Since Rails isn't used to include something like a tenant_id, and doesn't support composite primary keys either, you'd have to always hand-write your SQL.

Feel free to take a look at the source - it effectively ends up being a default scope with some additional glue code.

