
Multi-Tenant Architectures - sirkarthik
https://blog.codonomics.com/2020/08/multi-tenant-architectures.html
======
hashamali
I’ve found shared app, shared database completely workable by utilizing
Postgres’ row level security. Each row in any table is locked by a “tenant.id”
value matching a tenant_id column. At the application level, make all requests
set the appropriate tenant ID at request time. You get the data “isolation”
while using the simplest infrastructure setup.

~~~
number6
What about postgres' schema to isolate tenants?

[https://www.postgresql.org/docs/current/ddl-
schemas.html](https://www.postgresql.org/docs/current/ddl-schemas.html)

~~~
treis
These guys wrote the most popular Ruby on Rails multitenant gem and eventually
moved away from it:

[https://influitive.io/our-multi-tenancy-journey-with-
postgre...](https://influitive.io/our-multi-tenancy-journey-with-postgres-
schemas-and-apartment-6ecda151a21f)

Some things were mistakes but others look like pretty fundamental flaws.
Performance is a problem, database changes are a problem, and you aren't able
to query across tenants.

------
kissgyorgy
This is my favorite article on the topic: [http://ramblingsofraju.com/wp-
content/uploads/2016/08/Multi-...](http://ramblingsofraju.com/wp-
content/uploads/2016/08/Multi-Tenant-Data-Architecture.pdf)

Once it was available on MSDN site of Microsoft, but I can't find it on
microsoft.com anymore.

~~~
amanzi
This page on the Azure SQL docs site has lots of great info too:
[https://docs.microsoft.com/en-us/azure/azure-
sql/database/sa...](https://docs.microsoft.com/en-us/azure/azure-
sql/database/saas-tenancy-app-design-patterns)

------
zzzeek
Pretty telling that the top two comments at the moment endorse the two most
diametrically opposite approaches possible: single app and single set of
tables with row level security, vs totally separate apps + separate dbs
entirely.

I think it really depends on the kind of application and the kind of user /
customer you're dealing with. I'd probably lean towards "single" database with
horizontal sharding .

~~~
stingraycharles
Yup, we use “single tenant” as a major upsell to enterprises, and just run a
kubernetes cluster for each of these, with their own maintenance / release
cycles.

But we can only do that because they pay a lot of money for this stuff. Our
“shared, multi-tenant” environment is an order of magnitude cheaper.

~~~
HatchedLake721
Out of personal interest as someone who’ll be offering pricier “single tenant”
or “we’ll deploy to your AWS account”, how do you price this?

~~~
polack
We charge about 10x for enterprise customers. The only difference between what
they get and what a normal customer get is that we deploy the enterprise
customers on their own servers/network.

Considering how long end exhausting the sales process are with enterprise
customers I think the price is fair, even though the extra cost to operate
that kind of customer is negligible.

~~~
user5994461
I'd add that the minimum bill is $10000 for anything enterprise. That's a bare
minimum to handle all the time and extra work they will incur (months of sales
process, dedicated AWS instances, heavy support burden, single sign on
integrations, etc...).

As soon as you hear any of these, it's the hint you're dealing with enterprise
and you have to up the price tenfold.

------
Geee
Why is the shared app & shared database the most complex beast? Technically
it's the simplest and requires the least amount of work to set up and
maintain. You can even run the app & db on the same server. The complexity of
maintaining multiple app & db instances isn't worth it, at least when you're
starting out.

~~~
laluser
Everything is simple when you're starting out. Over time, you'd see that users
can have wildly different load patterns, which will affect how you scale and
shard your database over time. It's a huge pain to manage and grow.

~~~
zepolen
It's way easier to scale a shared app/db, anyone who tells you different
hasn't had to do devops.

Rule No. 1, never shard by account_id, Pareto is just waiting you kick you in
the ass. Always shard by whatever gives the best distribution of workload.

There are more rules of course.

~~~
sb8244
I've always considered data isolation as the #1 priority when it comes to
tenancy. Does that mean that the sharding should defacto be done on the tenant
ID?

~~~
pietherr
You shard for load distribution. If data isolation is #1 priority, use
separate tables, schemas, database instances.

~~~
zepolen
Separate disks, separate machines hell even separate networks, proper data
isolation can be super expensive to do right.

------
techdragon
Writing a shared app shared database tenant isolation database middleware for
Django was one of the most interesting challenges I’ve had over the years. The
library was 100% tested with Hypothesis to randomise the data, and used ULIDs
to allow for better long term tenant sharding and since ULIDs are compatible
with UUIDs they can be dumped/propagated into other systems for
analysis/analytical queries. It was quite a lesson in what 100% test coverage
does not actually prove since I still had bugs at 100% coverage that took work
to chase down, side effects, false positive/negatives, etc.

~~~
sirkarthik
Thanks for sharing your experience. Mind sharing the details of the bugs and
the middleware you were writing. IS that an open-source library that I can
take a peak into?

~~~
techdragon
I ran into so many interesting bugs, I’ve honestly forgotten most of them, but
I do remember many of them were actually to do with extensive use of UUIDs and
the interaction it had with more powerful Django ecosystem tools that did
things like expose a complete JSON-API endpoint, stuff that did “query
building” was generally not designed with accounting for. My favourite is
having to selectively use thread locals to disable the query filtering layer
for very specific internal queries that a few libraries used. I tried to get
it working with context vars for future Django async support but never had the
time to finish that work.

The code was written with open sourcing it in mind actually. The amount of
effort involved pushed me to select License Zero effort and specifically the
Prosperity Public License
[https://prosperitylicense.com](https://prosperitylicense.com) (License legal
TLDR is “open” but if you use it to make money you have to pay me something I
agree to) I’m hoping to do a significant refactoring before I mark the 1.0
version. I have some ideas that may be vastly cleaner but really requires
nailing the whole thread local / async context vars lifecycle so I’ve kind of
tried to limit who might start relying on it based on how extensively tested
it is.

------
kissgyorgy
I read multiple articles on the topic years ago, dealt with and designed
multi-tenant systems, and my approach is very simple for small developer
teams: separate databases, separate app instances running on separate domains
for every tenant. This is the least technically complex to implement and there
are very few mistakes you can make (only deployment). There are a lot of tools
nowadays which can help you automate and isolate the environment for these
(Docker, Ansible, whatever). Also it can be the most secure architecture of
all.

~~~
saberdancer
How is that multitenant? That's just running a separate instance of
application for each tenant.

I agree that it is a completely valid approach, especially if you can afford
it, but in my opinion it is just a single tenant application that can be
easily spun up. For example, what if you want to manage all of your "tenants"
within an admin interface. With a separate instance for all your tenants this
is not really possible without additional development or a completely separate
admin application.

~~~
social_quotient
I agree with you here. We typically only advocate for this if the budget/time
is low and the likelihood of a next tenant is not soon. Then we learn all we
can on the first couple clones and then move to a real multi tenant design
while also handling some tech debt we created the first time around.

What tends to happen is a first immediate client is needed along with a couple
of sales demo clients. This buys time for the business to see if it’s viable
before we bring on complexity.

But I agree clones of a core system isn’t multi tenancy.

For what it’s worth I tend to tell clients we won’t be sustainable after 4-5
of these such clones.

For people considering the clone approach let me caution one issue that comes
up about 90% of the time. A client will as for a custom feature and be willing
to pay for it for that “one” clone. These divergences are allowed to happen
because we aren’t multi-tenant yet. You need to work hard to educate during
this phase so you don’t create too much work down the line when you finally
consolidate and refactor.

------
pegas1
One aspect left out is upgrading: can you always distribute new features to
all tenants at the same time? If a new feature requires some training or
organizational change, then you need to deploy at the moment agreed on with
the tenant. From this point if view, models 1. and 3. are viable.

If extensions are rare, you can keep a switch for each and separate upgrades
on model 4. However, if you want changes on switch, then you have to keep old
code in conditional branches of your code forever. High cost of ownership.

------
saberdancer
A colleague implemented a hybrid system for a SaaS product he was leading.
Normally the product is a single app, single database with column as a tenant
ID discriminator, but for specific tenants he built in an option to specify a
separate database. This allowed all tenants that wanted higher performance or
data saved in a separate location to be able to buy in, while most of the
tenants were in a multitenant database.

Solution I implemented on my project (IoT) was shared apps, shared database.
When we were starting the project, we decided to use a column as a
discriminator and designed the system so that developers for entities that
need to be tenant specific, just need to extend an abstract class, the rest of
the system detects when you are trying to save or load such an entity and in
those cases it applies a filter or automatically assigns the tenant ID. This
means that normal developer can work just like he would on a single tenant
application. I feel this is pretty normal stuff.

------
icedchai
In practice, I've seen both shared app, shared DBs, and shared app, separate
DBs. "Shared / separate DBs" is not actually so black and white. I recommend
making your system configurable so you can dedicate DBs to a specific tenant
(or group of tenants) if needed. Most of them probably won't need it...

------
awinter-py
such an interesting topic. Not faulting this article for staying focused, but
other questions here are:

\- hybrid model where paid / larger clients get dedicated hardware for perf
reasons

\- YAGNI -- will you even need multitenancy?

\- business + legal considerations; which industries have legal or regulatory
requirements not to intermix

\- sharing and permissions -- what happens the first time someone needs to
share a doc cross-account?

\- tools and codebase strategies for verifying permission model on shared arch

~~~
chiph
The business model can be the overriding factor. If you're selling to Fortune
500 firms, they will require you to allocate them their own database and
application servers. But if you're selling to small/medium business you're in
more of a volume business and you can't justify the expense at that price-
point.

------
RickJWagner
The author doesn't mention application updates, which can be important.

Shared application scenarios can bring headaches when different customers want
different application behavior implemented in upgrades.

~~~
sirkarthik
Isn't that a different tangent? Does it matter, when the SaaS codebase is same
for all tenants? If you are dealing with different codebase for different
clients, then that is a flawed approach in that it is error prone and not
scalable as tenants increase in number.

~~~
nogabebop23
trying running a single SaaS codebase for enterprise clients. Maybe you can
upgrade everyone with CI on a single commit, but no line of business solution
wants that. We have to run a very strict and explicit upgrade cycle for our
apps that allows them to test extensively before committing to newer versions.

Just because you're SaaS doesn't mean your clients are...

~~~
squeaky-clean
It's not a 100% fail proof solution, but enforcing API versions in the the
request helps. /latest/ is available in our preprod environment, but in
production you can only call the API with an explicit version.

------
rkagerer
Perhaps I missed the point, but it seems strange and artificial to me that all
this article considers when discussing multi-tenant architectures are the
database and the app. There's so much more that goes into actually delivering
multi-tenancy in production.

Long ago I was the product manager of a complex enterprise platform which had
been heavily customized for one of our large banking customers. They hosted
the database on SQL Server shared clusters and much of the application backend
in VMware instances running on "mainframe-grade" servers (dozens of cores,
exotic high-speed storage). The hardware outlay alone was many hundred
thousand dollars, and we interfaced with no less than 5 FTE's who comprised
part of the teams maintaining it. Ours was one of a few applications hosted on
their stack.

Despite repeated assurances of dedicated resource provisioning committed to
us, our users often reported intermittent performance issues resulting in
timeouts in our app. I was the first to admit our code had lots of runway
remaining for performance optimization and more elegant handling of network
blips. We embarked on a concerted effort to clean it up and saw huge
improvements (which happily amortized to all of our other customers), but some
of the performance issues still lingered. Over and over again in meetings IT
pointed their fingers at us.

Eventually we replicated the issue in their DEV environment using lots of
scrubbed and sanitized data, and small armies of volunteer users. I had a
quite powerful laptop for the time (several CPU cores, 32GB RAM, high-end
SSD's in RAID) and during our internal testing I actually hosted an entire
scaled-down version of their DEV environment on it. During a site visit, we
migrated their scrubbed data to my machine and connected all their clients to
it. That's right, my little laptop replaced their whole back-end. It ran a bit
slower but after several hours the users reported zero timeouts. This cheeky
little demonstration finally caught the attention of some higher-up VP's who
pushed hard on their IT department. A week later they traced the issue to a
completely unrelated application that somehow managed to monopolize a good
chunk of their storage bandwidth at certain points in the day. Our application
was one of their more-utilized ones, but I bet correcting this issue must also
have brought some relief to their other "tenants".

I know this isn't a perfect example, but it demonstrates how architecture
encompasses a whole lot more than just the DB and apps. There's underlying
hardware, resource provisioning, trust boundaries, isolation and security
guarantees, risk containment, management, performance, monitoring and
alerting, backups, availability and redundancy, upgrade and rollback
capabilities, billing, etc. When you scale up toward Heroku/AWS/Azure/Google
Cloud size I imagine such concerns must be quite prominent.

~~~
hvidgaard
We often have customers complaining that our applications run significantly
slower after they receive an upgrade. Our usual procedure is always telling
them that we have not seen any performance regression internally or at other
customers, so we kindly ask them to look over what other changes they have
made to their environment, be it hardware or software.'

If they insist that the problem is our software, we tell them that we will
begin troubleshooting, but if the error is outside of our responsibility, we
will bill for all the hours used. 19 out of 20 times we restore the old
version and benchmark them against each other and the performance is
comparable. At that point they go back and recheck something, and it turns out
they allocated resources differently or another application was upgraded too.

------
aszen
Earlier we used to have multiple web apps with separate databases, now we are
using a single web app that can connect to different databases and
configuration based on the subdomain, so far it's worked great, having a
single web app really makes development and deployment a lot easier. We often
host multiple databases in a single rds instance to reduce costs. We get data
isolation and don't have to deal with sharding, of course this works well for
enterprise applications with just 50 - 100 tenants.

------
lukevp
It’s really best to design everything for a shared app shared db model, and to
have a user/tenant heirarchy in every app. You can start with each user
belonging to their own tenant. You can in the future shard the db and/or app
by tenant for scalability, you can provide a dedicated instance as required
for data residency or whatever.

------
amanzi
I quite like the Wordpress Multisite model which deploys a separate set of
tables for each blog in a single MySQL database. Then you can add on the
HyperDB plugin which lets you create rules to split the sets of tables into
different databases. This gives a lot of flexibility.

------
polote
previous related discussion
[https://news.ycombinator.com/item?id=23305111](https://news.ycombinator.com/item?id=23305111)
Ask HN: Has anybody shipped a web app at scale with 1 DB per account? 262
comments

------
FpUser
I design my products as multi-tenant (both code and database). This does not
mean however that the result can not be used as if it was a bunch of single
tenant instances. It is up to the client how they decide to deploy.

------
justincormack
No mention of sharding by usage pattern, which is the usual pattern at scale,
eg partition potentially app and database differently for users with different
fanout or scale or other properties that affect scaling.

------
kaydub
If you want to scale you share the apps and databases.

Nothing worse than having to spin up more and more instances and manage more
and more hardware/virtual hardware/services as your customer base grows.

------
tappleby
Are there any strategies for migrating from a separate db per tenant to shared
db with scoped tenant_id? In this case each tenant would have overlapping
primary keys.

~~~
pegas1
you add the tenantId to each table and you add it to the primary key of the
table too. then you just merge.

~~~
iovrthoughtthis
“Just”. Depends on the number of tenants and how divergent they are.

I would incrementally bring tenants together into a single DB.

~~~
tappleby
There are also some limitations if using something like rails / activerecord
which doesn't support composite primary keys natively

------
tracer4201
I worked on a system with the shared app + shared database model. At its core,
we received events (5-10KB) from customers and did something with those
events. In total, we were receiving 8K-10K events per second.

In terms of security and privacy, isolating an individual tenant from others
wasn't so much of a concern as each tenant was a customer within the
organization with the same data classification level. So from a security
perspective, we were "okay".

Where this gets interesting is that one tenant would suddenly decide to push a
massive volume of data. Now processing events within a specific SLA was a
critical non-functional requirement with this system. So then our on-call
engineers would get alerts because shared messaging queues were getting backed
up since Mr. Bob had decided to give us 3-5x his typical volume.

The traffic spike from one customer, which could last from minutes to hours,
would negatively impact our SLAs with the other customers. Now all the
customers would be upset. ^_0

Being internal customers, they were willing to pay for the excess traffic, but
we didn't really have the tooling to auto scale. Our customers also didn't
want us to rate limit them. Their expectation was that when they have traffic
spikes, we need to be able to deal with it.

Now – we didn't want to run extra machines that sat idle like 90% of the time.
And when we had these traffic spikes, we'd see our metrics and find maxed out
CPU, memory, and even worse, we'd consume all disk space from the log volume
on the machines filling everything up. The hosts would become zombies until
someone logged in and manually freed up disk.

There were a few lessons learned:

1\. Rate limit your customers (if your organization allows).

2\. If your customers are adamant that in some instances each month, they need
to be able to send you 5x the traffic without any notice, then you can't just
rate limit them and be done with it. We adopted a solution where we would let
our queues back up while some monitors would detect the excessive CPU or
memory usage and would start scaling out the infrastructure. Once our monitors
saw the message queues were looking normal again, they'd wait a little while
and then scale back down.

3\. When you're processing from a message queue, you need to capture metrics
to track which customer is sending you what volume. Otherwise, you can have
metrics on the message queues themselves and have one queue per customer.

4\. If it's a matter of life and death (it wasn't, but that's how one customer
described it), something you can do is stop logging when disk space usage
exceeds a specific amount.

5\. Also – when you have a high throughput system, think very carefully about
every log statement you have. What is its purpose? Does it really add value?

~~~
user5994461
I'd advise to set a classic rate limiting on the front load balancers,
something like 6k requests per minute per IP.

This works really well to stop clients from doing this sort of things in the
first place. Client devs get 429 errors for spamming the shit out of your
server, they add some sleep to spread out the requests a bit. Everybody wins.

You will never hear any complaint about it, it's infinitely easier for the
customer to add a sleep than to figure out how to contact your support.

------
yelloweyes
>Separate Apps & Separate Databases

> Suitability: This is the best way to begin your SAAS platform, for product-
> market fitment until stability and growth.

Uh what? How is this even feasible when you get to 1000s of clients?

~~~
cgrealy
I guess it depends on your customer model, but surely the keyword there is
"begin"?

It all depends on your customer/business model. If you're expecting to get to
1000s of clients quickly (i.e. within 12 months), then this will be an ops
nightmare, unless you have really good automation.

