Shipping Multi-Tenant SaaS Using Postgres Row-Level Security

mkurz · on July 26, 2022

Be aware when using RLS with views: By default the RLS policy will be executed with the permissions of the owner of the view instead with the permissions of the user executing the current query. This way it can easily happen that the RLS policy will be bypassed because the owner of the view is a admin account or the same account that owns the underlying table (see the the gotchas section of the original post).

However, upcoming PostgreSQL 15 adds support for security invoker views: https://github.com/postgres/postgres/commit/7faa5fc84bf46ea6... That means you can then define the security_invoker attribute when creating a view and this "... causes the underlying base relations to be checked against the privileges of the user of the view rather than the view owner" (see https://www.postgresql.org/docs/15/sql-createview.html) PG15 beta 1 release notes: https://www.postgresql.org/about/news/postgresql-15-beta-1-r...

markhalonen · on July 26, 2022

This was a shock to us after using RLS for a while. The solution outlined here worked great for us: https://www.benburwell.com/posts/row-level-security-postgres...

infogulch · on July 27, 2022

So they create table-valued functions which support the "SECURITY INVOKER" security context, and then select from that function to form the view. I suppose there's a feature request somewhere to support the "SECURITY INVOKER" feature for views directly?

infogulch · on July 27, 2022

Well well lookie here:

commitdiff 2022-03-22: Add support for security invoker views. - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

discussion 2021-12-17: [PATCH] Add reloption for views to enable RLS - https://www.postgresql.org/message-id/b66dd6d6-ad3e-c6f2-8b9...

explanatory blog post: 2022-03-22: Waiting for PostgreSQL 15 – Add support for security invoker views. - https://www.depesz.com/2022/03/22/waiting-for-postgresql-15-...

This seems to be slated for PG15: https://www.postgresql.org/docs/15/release-15.html#id-1.11.6...

> E.1.3.1.6. Privileges: Allow view access to be controlled by privileges of the view user (Christoph Heiss) Previously, view access could only be based on the view owner.

Syntatically it will look like:

    CREATE VIEW vista WITH (security_invoker=true) AS SELECT 'Hello World';

alberth · on July 27, 2022

That seems like a bug to me, and a significant one as well.

The underlining promise of RLS (sometimes even referred to as “virtual private database”) in an RDBMS, is that data should never leak because it’s handled transparently by the db.

This seems like a significant leakage point that the user has to personally manage.

Spivak · on July 27, 2022

This pretty much mirrors stored procedures though which have the option of running as the definer or the invoker. Breaking this with “when RLS is enabled stored procedures and views set to run with the permissions of the definer intersect with the RLS policy of the invoker” is crazy weird.

Maybe it would actually be good behavior but it would super super unintuitive.

alberth · on July 27, 2022

Isn’t it also super unintuitive that developers will have to manage partitioning customer data at the Application Layer even though they think they are using a Data Layer security policy.

When RLS is enabled, their should never be a situation where Customer A might be able to gain access to Customer B data. That’s literally the entire objective of the feature.

This is giving people a false sense of security, and for multi-tenant applications has massive consequences since this will leak data.

mixmastamyk · on July 28, 2022

Assuming correct, it's an absolutely horrible default that needs to be changed.

Imagine if the default Unix permission was to set files with setuid.

bearjaws · on July 26, 2022

This is such a killer feature in PG, my new job uses it and it makes audits of our tenancy model dead simple.

Coming from a SaaS company that used MySQL, we would get asked by some customers how we guarantee we segmented their data, and it always ended at the app layer. One customer (A fortune 10 company) asked if we could switch to SQL Server to get this feature...

Our largest customers ask how we do database multi-tenant and we point to our SDLC + PG docs and they go 'K'.

e1g · on July 26, 2022

Every B2B client who asked us how we handle multi-tenancy also asked how we ensure their data is erased at the end of the contract. Using a shared database with RLS means you have to go through all DB backups, delete individual rows for that tenant, then re-generate the backup. That’s a non-starter, so we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier.

brightball · on July 26, 2022

I filled out a ton of enterprise questionnaires on this stuff before and we just told people that it would be deleted when the backups expired after X days because we didn't have the capability to delete specific rows from our backups. Nobody ever argued.

There's not a single customer I've ever run across who's going to halt a contract because you can't purge their data from your backups fast enough. They're signing up because of what you offer, not the termination clause.

e1g · on July 26, 2022

A recent example from the other side: a client contacts me and says they will have to exit from our existing contract unless we can update our (AWS) infrastructure to use their (AWS) encryption keys for servers and databases handling their tenancy. In Enterprise, some tenants are very opinionated about what cloud you use and how their data lives/flows within it. I run all our infosec, including SOC2 & ISO27001 programs, and I know that using their encryption keys is nothing but security theater. But with $500k p.a. on the line, I also know when it's showtime.

brightball · on July 26, 2022

Fwiw, we also priced in dedicated database instances for people who wanted it where we had that capability. For the extra costs, nobody ever took us up on it.

victor9000 · on July 27, 2022

This is the right answer. You always say yes, and tell them how much it will cost. Then you can sit back and enjoy the sweet sound of crickets.

mixmastamyk · on July 28, 2022

Or "cha-ching!"

POPOSYS · on July 27, 2022

Would you please like to explain why "using their encryption keys is nothing but security theater"? Thanks!

X-Istence · on July 27, 2022

If you give me access to use your KMS key to encrypt/decrypt an EBS volume, and grant me access to that EBS volume/mount it to an AWS EC2 instance I manage, I can read/write data on that EBS volume all day long.

The fact that you own the KMS key doesn't stop me from reading/writing that EBS volume. It doesn't offer any additional security guarantees, especially if I was already encrypting the EBS volumes with a KMS key.

If I am a SaaS offering, whether I use a KMS key I own or one you own doesn't change the fact that I still have access to all of the unencrypted data that is silently being encrypted by those KMS keys, I've got access to the layer above it.

Sure, if the contract ended they could revoke the KMS key and now the data on the EBS volumes is no longer readable by me, but any backups I have of that data is still within my purview.

vlovich123 · on July 27, 2022

Human systems (contracts/legal) step in when technological systems can’t. Until there’s homomorphic encryption, enforcing that you’re using their key can give them piece of mind that they can revoke it. While true that you could be doing anything with that data, a contract in good standing and normal human ethics probably adds a high degree of likelihood that you’re not. However, if the relationship sours, they want the freedom to revoke quickly without needing your good will. If your backups aren’t using their keys, I think you’d be violating the contract.

But yes, from a purely technological perspective security theater. They could also misunderstand what’s happening and it’s also not worth it to try to explain for you at the risk of losing the contract.

X-Istence · on July 27, 2022

I don't know how this is for others, but in the environments I am in we use a different KMS key for our backups so that if something were to happen, like a mis-click in a web interface or an accidental terraform destroy, we can recover the data.

It is also stored in a different location than the original (different AWS account).

buzer · on July 27, 2022

The only real extra protection that I can think of that they get is that they can revoke the access at any time (e.g. right after terminating contract). If someone malicious gets into the provider's AWS account it's highly likely that they can get the access to the running machines and extract the data. I guess if attacker is stupid and just tries to restore the encrypted backup it could alert the customer instead of just the provider assuming that's a non-standard operation.

If there was a very deep integration, essentially column-level, it could make a lot more sense, but that could give quite a big performance & cost hit as well as limit what you can do (essentially you wouldn't be able to do any matching on content level).

e1g · on July 27, 2022

Other comments from X-Istence, vlovich123, and buzer are correct. If our apps/admins can get to client data, using their keys does not help to mitigate any plausible attack vector. If we become hostile or compromised, their data is toast anyway. Adding this technical control increases friction and makes the overall system more rigid/brittle but does nothing to increase security. Hence "security theater".

Other examples of such meaningless-and-impotent controls in infosec include "must run a firewall on your Linux server on a private subnet", and any policy requiring password complexity/rotation. But as a business, it's more productive for us to just tick the checkboxes than to spend resources on educating the market about the actual best practice (unless you're in the business of education).

LAC-Tech · on July 27, 2022

Interesting that they are so privy to how you administer stuff. Would they have terminated on the spot if you had been using azure or google or digital ocean? Or was using AWS in the initial contract?

e1g · on July 27, 2022

We had cases when a prospect said we must use AWS (because that's what they use, and are comfortable with how our data bridges would work/integrate), and when a prospect said we must not use AWS (because they are in retail and want to avoid feeding the 1,000-pound gorilla that's mauling them).

I don't want to imply this is common - it's not. Such cases come from the "arbitrary enterprise rules" side rather than the "SaaS/tech" side.

voberoi · on July 26, 2022

This is the way -- also never had an issue across US healthcare and enterprise SaaS.

coenhyde · on July 26, 2022

i did the same with the same results

wahnfrieden · on July 26, 2022

same. but! it’s a liability nonetheless, go talk with legal etc

mikeodds · on July 26, 2022

thirded

bearjaws · on July 26, 2022

We usually write a "reasonable best effort" clause into our deletion, that it will 100% be deleted from production within 30 days and automatically fall out of backups 60 days from there. This also helps since we can't control our downstream vendors such as Twilio, AWS SES, etc, who all have their own legal obligations and time frames.

Even for large health systems they have been okay with it.

jerryjerryjerry · on July 26, 2022

I think TTL feature provided by some DB vendors are actually orthogonal to multi tendency, where the former deals with life cycles policy of data but the original problem to delete the data of certain user is more related to privacy policy of data, though overlap may exists.

Dave3of5 · on July 27, 2022

> we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier

When you say 1 DB I suspect you mean you have a single DB Server and multiple DB's on that server. Then I don't think this really solves the data-residency problem as the clients data is just in a different DB but still on the same instance. It makes other problems for you as well for example you now have 2 DB's to run maintenance, upgrades, data migrations on. Current company uses a similar model for multiple types of systems and it makes upgrading the software very difficult.

It also makes scaling more difficult as instead of having a single DB cluster that you can tweek for everyone you'll need to tweek each cluster individually depending on the tenants that are on those clusters. You also have a practical limit to how many DB's you can have on any physical instance so your load balancing will become very tricky.

There are other problems it causes like federation which Enterprise Customers often want.

hobs · on July 26, 2022

Agree - performance on row level security (at least on SQL Server) is terrible, sharing by database is fairly easy.

axlee · on July 26, 2022

It makes BI work an absolute hellscape as well. Tradeoffs.

mrtesthah · on July 27, 2022

Just make the application collect the statistics you need. Put them into a separate shared DB or use one of the many existing off the shelf SaaS collection frameworks.

spiffytech · on July 27, 2022

> we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier

I'm surprised this simplified balancing for you. When I worked somewhere with per-customer DBs, we had constant problems with rebalancing load. Some customers grew too big for where we put them, some nodes usually performed fine until the wrong subset of customers ran batch jobs simultaneously, etc.

jquery_dev · on July 26, 2022

How do you manage your backend in this case? Do you have an insurance of backend for each customer or do you allow backend to make connections to all the DBs.

I'm interested in doing similar and wondering about the best way to handle the routing between the databases from a single backend.

mason55 · on July 26, 2022

It really depends on your requirements, both functional and cost. Having a full stack per customer can be great for a lot of reasons. It's probably the safest because you never have to worry about something getting messed up in the code and creating cross-customer access. Once you're sure the environment is set up correctly you can sleep well at night. You also don't have to worry about one customer doing something to impact the performance of other environments (unless you're using shared infra, like putting all your DBs on a single cluster). And it can make maintenance easier, for example you can data migrations can start with small and/or less important customers for practice. It also can give you more flexibility if you need to make special snowflakes (i.e. some big customer wants weird IP whitelisting rules that can't work with your normal env setup).

Downsides are that it's probably more expensive and more work. Even if your infra spin up is totally automated, you still need to keep track of all the environments, you still need to keep your Infrastructure-as-Code (e.g. your terraform scripts) up to date, more can go wrong when you make changes, there's more chance for environments to drift.

So, in short, separate stacks usually means more safety & simpler application architecture in exchange for more cost and more effort to manage the fleet.

gervwyk · on July 26, 2022

Also second this, we even split our AWS org into an AWS account per tentant. Although, this will maybe be a problem if we have +100s of clients. But it makes onboarding and off-loading simple.

HatchedLake721 · on July 26, 2022

It depends on an annual contract value (ACV), doesn't it? You can't give an AWS account to every $99 p/m plan, but you can for enterprise $50-100k+ deals.

gervwyk · on July 26, 2022

Yeah, that sums it up. I guess it means it can't be labelled as "multi-tenant" then..

jamesfinlayson · on July 27, 2022

Is account creation automated?

I know there are some resources where you can only have one per region (I think you can only have one AWS::EC2::VPCEndpoint per... type and service per region) but I don't know if letting multiple tenants use the same VPC endpoint is a risk or not.

X-Istence · on July 27, 2022

You can have an instance of a VPC Endpoint per VPC. You can have multiple VPC's all with the same VPC Endpoints.

You just get billed for them ;-)

jamesfinlayson · on July 29, 2022

Oh yes that makes complete sense - I'm living in a world where our internal AWS management team deploys our VPCs for us (one per account unless you have very special needs).

throw03172019 · on July 26, 2022

Each client is running on their own instances / load balancers?

BeefWellington · on July 27, 2022

One DB (or perhaps schema, depending on DBMS) per tenant is the way to go IMO. It simplifies so much of your work, from DR to deletion to scaling.

SahAssar · on July 26, 2022

I think the way to handle this (based on how many companies handle GDPR compliance) is to not keep backups older than X months (usually 3 months) and have a clause that all data past that time is deleted.

eastbound · on July 26, 2022

I honestly don’t understand how Oracle is still alive. Postgres has so many of these killer features.

Also, I wonder how others do tenant separation, what other solutions there are.

mritchie712 · on July 26, 2022

Legacy.

If you have thousands of lines of code relying on Oracle the cost to migrate would be enormous.

gav · on July 26, 2022

Ignoring the cost, there's the risk/reward alignment you see in large enterprises.

Imagine you're a new CIO. You know you're probably looking at a 3-5 year tenure at this new company and you want to lead with some big wins to set the tone and show your value.

You're reviewing proposals from your senior leadership. One of the options is an Oracle migration. It could cost a million dollars to migrate, but you'd save a million dollars a year going forward. Oracle runs your mission-critical internal systems, any issues with the migration and the system you migrate to is going to cause significant financial and reputation damage. You'll have to defend this decision if anything goes wrong, i.e. you've absorbed a lot of risk but a lot less upside to you personally.

What do you do? You put the proposal to the side and look for something that has a lot better upside.

beckingz · on July 26, 2022

Exactly. The risk/cost profile for migrations is bad: If it goes well, decent return. If it goes poorly, catastrophic.

slt2021 · on July 27, 2022

lift and shift migrations dont really make sense. It would make sense to do architectural rewrite, from on-prem Oracle monolith to cloud native serverless stack for example. Digital transformation, yeah

kennethh · on July 27, 2022

I worked in a bank previously and we migrated all our databases from Oracle to MS SQL Server. I think we used like 7-8 years to do it so I can understand people who are hesistant to convert. I think the advantages at the time (this was 10 years ago) was lower price for the db servers but also more people who are familiar with Sql Server compared to Oracle.

toomuchtodo · on July 26, 2022

Amazon has a great post on this topic.

https://aws.amazon.com/blogs/aws/migration-complete-amazons-...

I thought it was cool they retrained their Oracle DBAs into other roles as part of the project.

spacemanmatt · on July 26, 2022

I work with a few former-Oracle DBAs in a PostgreSQL-flavored consultancy now and they are aces. All the root-cause analysis and organization skills transfer handily.

paulmd · on July 26, 2022

Postgres is functionally and conceptually extremely similar to Oracle. There are a few oddities (in particular, oracle's "nulls are never in indexes" is kinda weird) but the redo log is similar to the WAL, etc. In most cases, similar approaches will perform similarly and experience pretty much transfers over with a few months of experience.

abraae · on July 26, 2022

Oracle has had the ability to do this for decades ("virtual private database"), so whatever is keeping them alive, it's nothing to do with this particular nifty Postgres feature.

ibejoeb · on July 27, 2022

PostgreSQL is a great product, but Oracle has so many more features. Even just in the query language. There's so much power embedded in just the model clause that would be all custom software in Postgres.

If you buy Oracle, you should use Oracle. Like really lean into it. If you really need it, it will be worth the money. I don't like dealing with Oracle sales, but the product is killer.

throwaway787544 · on July 26, 2022

They're like a tick. Very good at burrowing in and hard to remove. They have a lot of clients for whom a dozen million dollars is a drop in the bucket, and moving away is a decade-long millions-of-dollars project.

revskill · on July 26, 2022

Sharding is the only scalable way per my experience. The point about scalability here is, i can control the load as the data gets bigger.

Scarbutt · on July 26, 2022

oracle has flashback

gz5 · on July 26, 2022

And PG supports layer 3 shut-down of link listeners and inbound fw ports. So you can combine the L7 tenancy with a secure networking architecture which eliminates the problems of managing firewalls and ACLs. One of the open source examples: https://youtu.be/s-skpw7bUfI

ksec · on July 27, 2022

Fortune Top 10

1 Walmart

2 Amazon

3 Apple

4 CVS Health

5 UnitedHealth Group

6 Exxon Mobil

7 Berkshire Hathaway

8 Alphabet

9 McKesson

10 AmerisourceBergen

We can rule out 2,3,7,8 …

simonw · on July 26, 2022

I don't fully understand the performance implications here.

Say I was using this for a blog engine, and I wanted to run this SQL query:

    select * from entries;

But I actually only want to get back entries that my current user is allowed to view - where author_id = 57 for example.

Would PostgreSQL automatically turn the above query into the equivalent of this:

    select * from entries where author_id = 57;

And hence run quickly (assuming there's an index on that author_id column)?

Or would it need to run an additional SQL query check for every single row returned by my query to check row permissions, adding up to a lot of extra overhead?

ossopite · on July 26, 2022

yes, postgres will add such a condition to the query and in simple cases like this is able to use a corresponding index

unfortunately this can break down in more complex cases. roughly postgres trusts a limited set of functions and operators not to leak information about rows (e.g. via error messages) that the RLS policy says a query should not be able to see. that set includes basic comparisons but not more esoteric operations like JSON lookups. at some point postgres will insist on checking the RLS policy result for a row before doing any further work, which can preclude the use of indexes

lmeyerov · on July 26, 2022

We were looking at RLS, various ABAC integrated frameworks (casbin, ..), and zanzibar clones late last year --

* RLS is super appealing. Long-term, the architecture just makes so much more sense than bringing in additional maintenance/security/perf/etc burdens. So over time, I expect it to hollow out how much the others need to do, reducing them just to developer experience & tools (policy analysis, db log auditing, ...). Short-term, I'd only use it for simple internal projects because cross-tenant sharing is so useful in so many domains (esp if growing a business), and for now, RLS seems full of perf/expressivity/etc. footguns. So I wouldn't use for a SaaS unless something severely distinct tenant like payroll, and even then, I'd have a lot of operational questions before jumping in.

* For the needed flexibility and app layer controls, we took the middle of casbin, though others tools emerging to. Unlike the zanzibar style tools that bring another DB + runtime + ..., casbin's system of record is our existing system of record. Using it is more like a regular library call than growing the dumpster fire that is most distributed systems. Database backups, maintenance, migrations, etc are business as usual, no need to introduce more PITAs here, and especially not a vendor-in-the-middle with proprietary API protocols that we're stuck with ~forever as a dependency.

* A separate managed service might make zanzibar-style OK in some cases. One aspect is ensuring the use case won't suffer the view problem. From there, it just comes down to governance & risk. Auth0 being bought by Okta means we kind of know what it'll look like for awhile, and big cloud providers have growing identity services, which may be fine for folks. Startup-of-the-month owning parts of your control plane is scarier to me: if they get hacked, go out of business, get acquired by EvilCorp or raise $100M in VC and jack up prices, etc.

There's a lot of innovation to do here. A super-RLS postgres startup is on my list of easily growable ideas :)

On a related note: We're doing a bunch of analytics work on how to look at internal+customer auth logs -- viz, anomaly detection, and supervised behavioral AI -- so if folks are into things like looking into account take overs & privilege escalations / access abuse / fraud in their own logs, would love to chat!

jzelinskie · on July 26, 2022

As the developer of an external authorization system (full disclosure)[0], I feel obligated to chime in the critiques of external authorization systems in this article. I don't think they're far off base, as we do recommend RLS for use cases like what the article covers, but anyways, here's my two cents:

1+2: Cost + Unnecessary complexity: this argument can be used against anything that doesn't fit the given use case. There's no silver bullet for any choice of solution. You should only adopt the solution that makes the most sense for you and vendors should be candid about when they wouldn't recommend adopting their solution -- it'd be bad for both the users and reputation of the solution.

3: External dependencies: That depends on the toolchain. Integration testing against SpiceDB is easier than Postgres, IMO [1]. SpiceDB integration tests can run fully parallelized and can also model check your schema so that you're certain there are no flaws in your design. In practice, I haven't seen folks write tests to assert that their assumptions about RLS are maintained over time. The last place you want invariants to drift is authorization code.

4: Multi-tenancy is core to our product: I'm not sure I'm steel-manning this point, but I'll do my best. Most companies do not employ authorization experts and solutions worth their salt should support modeling multi-tenant use cases in a safe way. SpiceDB has a schema language with idioms and recommendations to implement functionality like multi-tenancy, but still leaves it in the hands of developers to construct the abstraction that matches their domain[2].

[0]: https://github.com/authzed/spicedb

[1]: https://github.com/authzed/examples/tree/main/integration-te...

[2]: https://docs.authzed.com/guides/schema

gwen-shapira · on July 26, 2022

The blog explicitly said that if the requirements involve actual authorization models (beyond simple tenancy) then RLS is not the best fit (see: https://thenile.dev/blog/multi-tenant-rls#if-you-have-sophis...).

I think this covers both the complexity aspect and the difference between what you get from RLS and what external authz brings to the table (schema, for example).

I do think that RLS is a great way for a company without authz experts to built a multi-tenant MVP safely. I've yet to see a single pre-PMF company that worries about authorization beyond that, this is a series-B concern in my experience.

semitones · on July 26, 2022

Btw that's a localhost link

gwen-shapira · on July 26, 2022

oops :) Too many tabs. Fixed and thank you.

shaicoleman · on July 26, 2022

We're currently using the schema-per-tenant, and it's working very well for us:

* No extra operational overhead, it's just one database

* Allows to delete a single schema, useful for GDPR compliance

* Allows to easily backup/restore a single schema

* Easier to view and reason about the data from an admin point of view

* An issue in a single tenant doesn't affect other tenants

* Downtime for maintenance is shorter (e.g. database migration, non-concurrent REINDEX, VACUUM FULL, etc.)

* Less chance of deadlocks, locking for updates, etc.

* Allows easier testing and development by subsetting tenants data

* Smaller indexes, more efficient joins, faster table scans, more optimal query plans, etc. With row level security, every index needs to be a compound index

* Easy path to sharding per tenant if needed. Just move some schemas to a different DB

* Allows to have shared data and per-tenant data on the same database. That doesn't work with the tenant-per-database approach

There are a few cons, but they are pretty minor compared to the alternative approaches:

* A bit more code to deal in the tenancy, migrations, etc. We opted to write our own code rather than use an existing solution

* A bit more hassle when dealing with PostgreSQL extensions . It's best to install extensions into a separate extensions schema

* Possible caching bugs so you need to namespace the cache, and clear the query cache when switching tenant

* The security guarantees of per tenant solution aren't perfect, so you need to ensure you have no SQL injection vulnerabilities

bvirb · on July 26, 2022

We ran a multi-tenant SaaS product for years w/ a schema-per-tenant approach. For the most part it all worked pretty great.

We ran into issues here and there but always found a way to work around them:

* Incremental backups were a pain because of needing to lock so many objects (# of schemas X # of tables per schema).

* The extra code to deal w/ migrations was kinda messy (as you mentioned).

* Globally unique IDs become the combination of the row ID + the tenant ID, etc...

For us though the real deal-breaker turned out to be that we wanted to have real foreign keys pointing to individual rows in tenant schemas from outside of the tenant schema and we couldn't. No way to fix that one since with multi-schema the "tenant" relies on DB metadata (the schema name).

We ended up migrating the whole app to RLS (which itself was a pretty interesting journey). We were afraid of performance issues since the multi-schema approach kinda gives you partitioning for free, but with the index usage on the RLS constraints we've had great performance (at least for our use case!).

After quite a bit of time working with both multi-schema & RLS I probably wouldn't go back to multi-schema unless I had a real compelling reason to do so due to the added complexity. I really liked the multi-schema approach, and I think most of the critiques of it I found were relatively easy to work around, but RLS has been a lot simpler for us.

POPOSYS · on July 27, 2022

Would you like to explain the deal-breaker a little bit more? I do not understand the limitation you hit. It seems like it is no problem to access foreign keys in a different schema as long permissions do allow that. Thanks!

uhoh-itsmaciek · on July 26, 2022

There are some other cons:

Memory usage and I/O can be less efficient. Postgres handles table data in 8kb pages, so even if you're just reading a single row, that reads 8kb from disk and puts 8kb in the Postgres buffer cache, with that row and whatever happens to be next to it in the physical layout of the underlying table. Postgres does this because of locality of reference: it's cheaper to bulk-load data from disk, and, statistically speaking, you may need the adjacent data soon. If each user is touching separate tables, you're loading a page per row for each user, and you're missing out on some of the locality benefits.

Another problem is monitoring (disclosure: I work for pganalyze, which offers a Postgres monitoring service). The pg_stat_statements extension can track execution stats of all normalized queries in your database, and that's a very useful tool to find and address performance problems. But whereas queries like "SELECT * FROM posts WHERE user_id = 123" and "SELECT * FROM posts WHERE user_id = 345" normalize to the same thing, schema-qualified queries like "SELECT * FROM user_123.posts" and "SELECT * FROM user_345.posts" normalize to different things, so you cannot easily consider their performance in aggregate (not to mention bloating pg_stat_statements by tracking so many distinct query stats). This is the case even when you're using search_path so that your schema is not explicitly in your query text.

Also, performance of tools like pg_dump is not great with a ton of database objects (tables and schemas) and, e.g., you can run into max_locks_per_transaction [1] limits, and changing that requires a server restart.

I wouldn't say you should never do schema-based multi-tenancy (you point out some good advantages above), but I'd be extremely skeptical of using it in situations where you expect to have a lot of users.

[1]: https://www.postgresql.org/docs/current/runtime-config-locks...

andy_ppp · on July 27, 2022

I find adding loads of stuff to Postgres exciting and fun, but I want all of my logic in the code in GitHub, rather that floating around in my global data store. Has anyone thought about a data layer that allows you to define this stuff programmatically rather than in SQL but then it configures your data layer to work like this. Not necessarily an ORM but more a business logic layer that compiles everything down to use features like this. Or maybe even a data layer that is a set of programmatic building blocks that works as described?

hamandcheese · on July 27, 2022

Flyway is a plain-sql migration tool with support for “repeatable migrations” which somewhat do what you want:

https://flywaydb.org/documentation/tutorials/repeatable

macNchz · on July 27, 2022

I’ve set something like that up a handful of times in a kind of ad-hoc manner, by subclassing/extending the autogeneration tools from existing db migration frameworks to just detect changes in a directory of .sql files. Has worked pretty well to keep stored procedures/triggers/materialized views up to date with the repo.

lmeyerov · on July 27, 2022

Yep -- postgres enforces, while config-as-code can inject as usual bc the whole point is it is just SQL, so policy changes are just migrations (and subject to SDLC)

mixmastamyk · on July 28, 2022

You're describing an ORM, or perhaps SQLAlchemy, which has a lower level interface. .sql files work fine in version control as well. "create or replace …" pattern can make them idempotent.

simlevesque · on July 27, 2022

Supabase kinda does this.

LAC-Tech · on July 27, 2022

Does superbase basically give you a pgAdmin accessible instance to play with, then generate a JSON API using postgREST? It's not 100% clear to me from skimming the docs.

keybits · on July 27, 2022

uhoh-itsmaciek · on July 26, 2022

>Another issue we caught during testing was that some requests were being authorized with a previous request’s user id.

This is the terrifying part about RLS to me: having to rely on managing the user id as part of the database connection session seems like an easy way to shoot yourself in the foot (especially when combined with connection pooling). Adding WHERE clauses everywhere isn't great, but at least it's explicit.

That said, I've never used RLS, and I am pretty curious: it does seem like a great solution other than that one gotcha.

koolba · on July 27, 2022

If you do this right, it ends up in just one place and the code that handles checking in / out connections from your pool will handle wrapping your connection usage with the appropriate session context. And of course subsequently clearing it with a DISCARD ALL.

sgarman · on July 26, 2022

Am I right in my understanding that EVERY request that comes in to their api creates a new connection to the database? What about reusing connections with connection pools or one level up using pgbouncer or thing. Can you actually use RLS while reusing connections?

lemax · on July 26, 2022

It's possible to implement this without creating new connections to the database for each request by using SET LOCAL and wrapping every query in a transaction. Instead of applying RLS based on the current user, you apply RLS based on the parameter value you set at the beginning of the transaction. You can set this parameter value based on the user session in your application.

Your RLS policy looks as follows: CREATE POLICY tenant_${tableName}_isolation_policy ON "${tableName}" USING ("tenant_id" = current_setting('app.current_tenant');

Your queries look something like this: BEGIN TRANSACTION SET LOCAL app.current_tenant = '${tenant}'; SELECT * from some_table END TRANSACTION;

You can even initialize your writes with a `tenant_id` column defaulted to your `current_setting('app.current_tenant')`

rst · on July 26, 2022

Nope. Quoting the article itself:

"In the traditional use case of direct db access, RLS works by defining policies on tables that filter rows based on the current db user. For a SaaS application, however, defining a new db user for each app user is clunky. For an application use case you can dynamically set and retrieve users using Postgres’ current_settings() function ( i.e:

  SET app.current_app_user = ‘usr_123’

and

  SELECT current_settings(‘app.current_app_user)

)."

The policies that they define reference these settings, so they can do a "set" at the start of processing every web request, on a pre-existing db connection.

cstejerean · on July 26, 2022

You can reuse the connection with a connection pool and use SET ROLE when you check it out.

ishanr · on July 27, 2022

I use RLS quite heavily for my app Sudopad (https://sudopad.com) and it has been working quite well so far.

One gotcha specific to Supabase (where I run the backend) is because there is no anonymous login in Supabase, turning on RLS and using database functions marked as security definers are the way to go. Otherwise there is no easy way of stopping a 'select * from x' since some rows might not have a user_id if they are anonymous and I still want people to access the row if they know a specific primary key uuid.

mgl · on July 26, 2022

Row-level security is always a tricky and hard to enforce assumption as this not how we relational databases really.

Much bigger fan of the approach described here:

Scalability, Allocation, and Processing of Data for Multitenancy

https://stratoflow.com/data-scalability-allocation-processin...

fswd · on July 26, 2022

I use this for a startup in a re-write of their solution. It simplifies my queries and mutations, and security concerns. It also drammatically reduces the complexity of my code. There's also ROLES (Guest/Public user, Authenticated, Admin) and combinding the roles with Row Level Security.

I like it so much I don't want to go back!

ei8ths · on July 26, 2022

I needed this two years ago, i was looking at this but couldn't figure out how to do it with a existing db connection pool to reuse connections. I might be migrating to this soon so that things will be more isolated from the tenants.

a-dub · on July 26, 2022

this is cool. next up, row level encryption with private information retrieval methods for enabling queries and searches homomorphically (on data encrypted by the client that the service provider never has a key for).

POPOSYS · on July 27, 2022

This is interesting - do you have any hits / docs on how to implement that?

a-dub · on July 27, 2022

it's an open area of research as far as i know. the technical terms are "private information retrieval", "private information query", "homomorphic encryption" and "secure multiparty computation."

it's some of the coolest stuff in cryptography but it also tends to be experimental or very computationally expensive. zero knowledge proofs are the simplest form i know of.

as far as i know, row level end to end encrypted databases with indices that preserve privacy do not exist yet, but i fully expect that they will.

xsreality · on July 27, 2022

Curious how advanced authorization (like ABAC) can be implemented with RLS. For example returning resources that are accessible to the team I belong to within the tenant.

spacemanmatt · on July 26, 2022

If I were leaning into RLS today I would do it through PostgREST

pikdum · on July 26, 2022

PostGraphile is also a really neat tool, using GraphQL instead of REST.

andrewstuart · on July 26, 2022

I once implemented RLS/Postgres for Django.

It worked pretty well.

The basic mechanism was to intercept all outbound SQL queries and wrap them in postgres environment variables that set up the RLS.

santa_boy · on July 27, 2022

This is awesome. Are there any similar features that can be implemented with Mariadb? One of my favorite products is integrated with Mariadb.

kache_ · on July 26, 2022

Context aware data access is really cool. And hard :)

jtwebman · on July 27, 2022

What about stop using ORM abstractions as an option then it is much harder to forget needed filters?

nbevans · on July 26, 2022

Using RLS to implement multi-tenancy is a terrible idea. Just deploy a database per tenant. It's not hard. Why overcomplicate it?

bicijay · on July 26, 2022

Deploying a database per tenant is not that easy. You have a lot of new overhead, migrations become a pain in the ass (already are) and a lot of other little problems...

I would say a database per tenant is overcomplicating it.

mbreese · on July 26, 2022

A database per tenant makes the rest of the workflow significantly easier though. No need to add clauses to SQL WHERE statements for users/groups. Queries are faster (less data). And data can be moved much easier between servers.

Yes, it does add extra overhead at account creation, during DB migrations, and for backups.

But if you don’t need cross-account or public data access, it can make life much easier.

zie · on July 26, 2022

> No need to add clauses to SQL WHERE statements for users/groups.

This is basically what RLS does for you. You specify the access and you specify the current user(via a connection, SET ROLE, etc). Then it does all that complicated query filtering stuff for you, to ensure you don't screw it up.

> Queries are faster (less data). And data can be moved much easier between servers.

Not really, the overhead is just different(and likely more of it) in your solution. It's not wrong nor is using RLS right.

nickjj · on July 26, 2022

> No need to add clauses to SQL WHERE statements for users/groups.

ORMs have support to add in these conditions automatically, you only have to define the condition in 1 function. This gives you the best of both worlds. You can use your FK based separation and have application level guarantees that your tables with a tenant id are scoped to a tenant by default and you have the option to query multiple tenants to do super admin level reporting across tenants by using explicit function calls that basically say "yes I know I'm bypassing the auto-enforced tenancy check".

There's no performance issues since it's all 1 schema and there's no nightmare or overhead of having to run a DB migration for each database.

I'm not sure why the author didn't include this in their blog post because it's a popular option in Rails, Flask (SQLAlchemy), etc..

Gigachad · on July 26, 2022

You end up building just a bit of automation around it and its fine. The migration isn't any harder, you just run an ansible job or something to roll it out to all databases.

pg_bot · on July 26, 2022

Deploying multiple databases is typically costly in the infrastructure as a service space. Plus you have more operational overhead in ensuring backups work and keeping things secure. It's much easier to use Postgres' schemas to segment the data within one single database. Frankly schemas are much easier to reason about, maintain, scale, and keep compliant than row level security.

icedchai · on July 26, 2022

I worked for a SaaS that did that. We had 1000's of clients. Migrations would take all afternoon. We had custom connection pools, custom migration runners, and other weird stuff (this was about 10 years ago.) It was way too complicated, especially since most of the tenants had very little data.

jasongi · on July 27, 2022

Only really worth it if each tenant is creating enough value to justify this. Hard to see how a SaaS product with a cheap/free tier would pull this off.

icedchai · on July 27, 2022

We did this for the free tier at a previous company. It was terrible in terms of overhead.

paxys · on July 26, 2022

Is having to write "SELECT [...] WHERE user_id=<123>" really considered a security hole? Isn't that how like every service in existence operates? Coming up with complicated auth systems and patterns just because you are scared you will accidentally skip that WHERE clause seems bizarre to me.

nordsieck · on July 26, 2022

> Is having to write "SELECT [...] WHERE user_id=<123>" really considered a security hole? Isn't that how like every service in existence operates? Coming up with complicated auth systems and patterns just because you are scared you will accidentally skip that WHERE clause seems bizarre to me.

Is having to avoid use after free really considered a security hole? Isn't that how like every program in existence operates? Coming up with complicated languages and frameworks just because you're scared you will accidentally use a variable after it's been freed seems bizarre to me.

As it turns out, humans are bad at being consistent, whereas computers are much better. Maybe this particularly solution isn't "the right thing", but it's at least an attempt at modifying the environment such that mistakes no longer happen. And at a meta level, that is precisely the right thing to do.

bvirb · on July 26, 2022

It's pretty nice using RLS that the entire query will follow the rules applied in the database. So for complex queries with say joins and/or subqueries they will all automatically follow the RLS policies as well. In our case we also have some global lookup tables that don't have RLS policies which can also be joined.

We've found it pretty nice to cut out a whole class of possible bugs by being able to defer it to the database level. At the application level we end up with a wrapper that sets (and guarantees unsetting) multi-tenant access to the correct tenant, and then we never have to add "tenant_id = ..." anywhere, regardless of the query. Regardless of whether we forget in some query (which we almost surely would), it cuts out quite a bit of extra code.

You can also do some cool stuff like add RLS policies for read-only multi-tenant access. Then you can query data across multiple tenants while enforcing that nothing accidentally gets written.

mbreese · on July 26, 2022

From my perspective, it isn’t the security aspects that are limiting, but the usability.

If you want to have any access controls that isn’t a simple user_id==123, SQL WHERE clauses can get complicated.

Users, groups, or any kind of fine grained access control can make simple queries non-trivial. It’s even worse if a user can be authorized to view data across different accounts.

lemax · on July 26, 2022

In my experience we've looked toward this kind of solution in a large legacy single-tenant application that wants to go multi tenant with more safety guarantees.

thrownaway561 · on July 26, 2022

I think this is mainly an issue when you're using RAW SQL statements. If you're using an ORM, there are many ways to add a where clause to the statements automatically without having to update your code every where.

galaxyLogic · on July 26, 2022

When you say 'user_id' do you mean each end-user of the system or each customer?

I assume you have a few customers and then very many users belonging to each customer.