If you aspire to deliver the same software solution to multiple customers then most likely you are in the SaaS business.
And as Marc Benioff famously observed a long time ago, "Multi-tenancy is a requirement for a SaaS vendor to be successful.".
I think about this often because this was a crucial thing we got wrong at my last company.
We got a few large enterprise customers early on, which was great. But each had some unique requirements at the time. With hindsight, they weren't really that unique at all, but there was only a single customer asking for each one.
We took the decision to use separate databases (schemas in the Oracle world) for each customer. That way we could more easily roll out features to individual customers. We were careful to keep only a single codebase, I'd seen that before. But still, any customer could be on their own version of that codebase at any time, with their own schema that matched the code at that version.
I now think of this approach as maybe getting into heroin (no direct experience). Feels great, powerful, you can do anything. But ultimately it will kill you, and the longer you do it, the harder it is to get back onto the path of righteousness - a decent multi-tenant architecture.
Even though some of the benefit of multi-tenancy is (supposedly) simpler management (one set of resources), multi-tenancy can actually become more difficult as the customer pool gets bigger, or workloads get more uneven. Maintenance on the whole pool becomes more and more problematic, and you try to patch around or delay it by scaling vertically. You basically run into every possible problem and hit limits you hadn't thought of sooner than with single-tenancy. And worst of all, it's impacting more and more customers.
Traffic shaping can help with customers that need a special SLA, Even routing some traffic to servers that no other users can access. You can also corral bad actors in the same way.
So customer A is actually on load balancer #1, elasticsearch #3, db cluster #2, app server asg #4, etc.. Then when you need to carve out separate resources for VIP customers you just add a new number and only assign them to it.
Multi-tenant architecture with horizontal scaling!
It's like spoon theory. As a product team, you only have so many spoons. Every bit of difficulty you spend on maintaining a system's operation takes away spoons that could be used for other parts of the product. Regardless of whether you use multi-tenant or hybrid or single-tenant, if it starts to take away all your spoons, you should be ready to try a different model and see if you get some spoons back. (I think this applies to all aspects of a product team, not just operations)
The main problem with multi-tenancy like that is migrations. It can take a lot of time. There are 2 strategies: backward and forward compatibilities. Or, green blue deployments with ISITIO or equivalent tooling (e.g. if one server is behind it is still routed to the old front-end).
The key for implementing a multi-tenant applications in Spring is AbstractRoutingDataSource. It is not that hard, see https://stackoverflow.com/questions/49759672/multi-tenancy-m...
It is really just another set of tradeoffs and I think the author does a good job of detailing them from the perspective of company with internal customers. With external customers, the calculus can change.
> But ultimately it will kill you, and the longer you do it, the harder it is to get back onto the path of righteousness - a decent multi-tenant architecture.
The nice thing about a multi-tenant architecture is that it enforces consistency and therefore gives you scale. This can be achieved with single tenant as long as you are ruthless about it (that's what we have done at my current job).
We run separate servers for each client (they can also self host). We built a system to let clients control their version, so they can stay at an earlier release. But everyone stays on the mainline codebase and, critically, database schema.
But sticking with a multi-tenant architecture for a publicly facing application will make it easier to enforce that consistency. That may lose you some sales, but will lead to better scalability.
That's assuming a single multi-tenant saas footprint.
There is absolutely nothing preventing you from standing up a footprint in each compliance region and customers are assigned to the region that satisfies their requirements for data location.
You get the benefit of less footprints to manage, while still meeting the requirements necessary to serve your customers.
And if you have a customer that absolutely must have isolated infrastructure, stand one up for them and pass along the increased cost associated.
This setup is how every company I have worked for that has had government clients handles them- a dedicated 'footprint' isolated from your commercial footprint, even if it remains on the same cloud provider. AWS even has GovCloud regions specifically for this scenario.
We also have the ability for specific modules of the monolithic application to have their own database that is scoped to that module (we’re using Elixir, so client-specific implementations are scoped as an umbrella application).
Then we have satellite systems. These are _usually_ edge servers that have client-specific configuration and branding on what they share to the web. They may also have their own database used for various other purposes. Most of these are built so that the data within is isolated from the core system and can be put in appropriate locations for data residency.
Our roll-outs first apply migrations, then deploy the code. Migrations are applied by iterating all databases, and it sometimes it can take for up to several hours, before the code is finally deployed (so that it can use the new schema in every DB). It creates a very large window where old code can see new DB schemas, so we have to be careful for our migrations to be forward- and backward-compatible.
Microservices have a different approach: usually there's a single database for all tenants (a database per microservice, of course), what is sharded instead is tables. There's tables like "user_0", "user_1" et cetera; they are created automatically when needed. It allows some degree of isolation (although several tenants can occupy same sharded tables), but the main benefit is that scanning such tables is faster. The migration mechanism can enumerate all such sharded tables and apply migrations to them one by one. For data isolation, there's a requirement that each table must have an "accountID" column which must be always checked in each access to the repository on the infrastructure level (otherwise it shouldn't pass the code review). Account ID itself comes from the JWT token from the request headers, so a malicious tenant can't access other tenants by just changing the account ID in the request. Business logic doesn't pass account ID's around in function signatures, it happens transparently on the infrastructure level (it's passed to the repository constructor when building the service graph in the dependency container).
This was kinda scary to read. just because an API isn't documented doesn't mean it isn't public! Broken Access Control is the top OWASP issue in 2021: https://owasp.org/Top10/
Visiting the sample 'Comments' link and looking at the network console in Firefox revealed this call: https://cmmnts-api.i.bt.no/v1/publications/e24/article:e24:a... DESC&replies=ASC
- Shared compute with tenant context passed via JWT
- Data isolation by either physical separation (i.e. separate database) or logical separation (i.e. separate schema, table, or column association) depending on requirements
- Enforcing tenant context at the API gateway
- Always leveraging policies and ACLs via JWT to enforce secure data retrieval
- Sometimes using RLS within the database
- Either universal data encryption or per tenant depending on requirements
- Shared compute is actually the part that to me means diddly squat and customers seem to prefer dedicated. It costs nothing to spin up more stateless-ish app servers dedicated to a tenant. It’s the db, logs, caching, load balancers, queues, monitoring I don’t want to split up. Also nothing is still wrong with normal sessions stores in Redis.
- Separate schemas are not preferred but fine kinda but at very least don’t create separate db accounts per tenant. The credential/connection management will make your life a living nightmare and doesn’t work with SQL proxies.
- We must seriously have vastly different JWT experiences. Every super businessy app I’ve made hits the ceiling fast of how much junk you can store in the JWT before having to punt to the db for user permissions.
- RLS is dope and you should choose it every time when you can. Not having to do #customers schema migrations is worth it.
It's interesting to me that you and @abraae seem to take the exact opposite view on the topic of data isolation where he/she has a much... harsher opinion:
>We took the decision to use separate databases (schemas in the Oracle world) ... I now think of this approach as maybe getting into heroin (no direct experience). Feels great, powerful, you can do anything. But ultimately it will kill you.
Of course this doesn't apply to the case of column association, but I'm interested on your take on this.
If your app is super-sensitive and absolute security is an absolute must then sure, using a separate database for every customer will help you answer those pesky security questions "How do you ensure that resources are not accessed by other tenants?".
My experience though is that if you are after velocity, and ease of maintenance, then you need a single database and tables with a "tenant id" column in them.
Even simple things get hard when you have separate databases. Say I want to know how many customers have how many widgets on average. If everyone's in one database, that's a SQL query. If they're all separate, it's ten times harder to answer.
When is it not a good idea to leverage the database's RLS for access control?
I wonder how well row level security policies really work. There are many interesting articles in the AWS documentation for this e.g. "Multi-tenant data isolation with PostgreSQL Row Level Security" https://aws.amazon.com/de/blogs/database/multi-tenant-data-i...
A secure data model should make the tenant identifier necessary to successfully complete a query. Haven’t composite keys (including composite primary keys) been around since SQL86?
A good application layer, likewise, enforces that a tenant identifier is set on every endpoint, with no additional code, just from creating the endpoint.
We're a B2B/Enterprise SaaS and most tenants require that we erase all their data at the end of the contract. Some require customer-managed encryption keys. The only way to meet this requirement is to have every tenant isolated in their own database (and their own S3 bucket etc). If data is mixed, when one tenant leaves you must go through all copies of all backups, purge their rows, then re-save the cleaned up backups. Nearly impossible in practice.
I'd say in this case the obvious problem is the comment is associated with a tenant. The comment should be associated with a user which should (probably) be associated with exactly one tenant.