I don't usually go on tirades like this, but this is what I do for a living.
You'll want to use schemas vs. databases. Not only is it significantly more convenient (i.e., you'll administer 1 database instead of n), it will let you scale. For instance, a postgres database has a pool of shared memory that it uses for query plans, caching, etc., and you'll be able to optimize the use of it across all of your clients, rather than one by one.
Related is the impact on connection resources. A connection to a database may be used for multiple schemas, so you'll be able to make much better use of those. Suppose you have 1000 clients, which each typically have 1-2 users on your application when it's typically in use. In the multiple database config, you'll need one connection pool per database, each pool having at least 1 open connection, so that under no load you're still maintaining 1000 connections. With the schemas approach, you can maximize the throughput and balance the connections proportionally to your active users rather than the number of customers you have.
Oh, one more biggie: remember the shared buffers? Those are typically per-session (connection). So if you frequently run "select price from widgets where category=?", you'll want to parse, rewrite, plan, compile, and cache that as infrequently as possible, so keep those connections and sessions open as long as possible so you get the benefit of prepared statements. You will see load go down if you do this, from reduced CPU for parsing and plan generation, and reduced system wait since you won't be churning your shared buffers as much.
My understanding is that a schema is an abstract definition of what your database fields/relations are. So that's too vague for me.
Adding a customer_id field to every single table in the database would technically fit the definition of "schema", but I suspect you mean something else altogether. Please elaborate.
But you have a point: adding customer_id is not unreasonable. In fact, this is my preferred logical approach, but I'll typically only do this on Oracle. It's DRY, meaning that, in the abstract definition of schema, you're modeling your relations once. If you have widgets belonging to customers, you have a one widgets relation containing a customer id, rather than one widgets relation in each customer schema. It's an elegant design, and it takes even greater advantage of the benefits I described earlier.
I'd advocate against it when building a multitenant SaaS application on Postgres if the following hold:
1. You have many, many customers
2. Each customer has a lot of data
1. You'll quickly hit a place where it's hard to throw more vertical processing power at it. You'll need a serious SAN with lots of spindles to be able to service many request to a lot of data, and plenty of CPU and RAM to handle the grunt work. Let's face it: you're on Postgres because you're broke :) I really love it, but sometimes I'd rather buy a solution to a problem (e.g., VPD). However, if you're rolling out schemas, you can build a new database and move them discretely. This isn't automatic or transparent, but it's very clean and very easy.
2. You're likely dealing with clients with special needs. It might not be appropriate to roll out new features to every customer all at the same time. If you ALTER TABLE widgets... it's all or nothing. Sometimes that's great, but sometimes it's not, and in building multitenant SaaS apps, I find it's most often not. In repeating your design, you're buying the flexibility to deal with your customers on a more granular level.
3. Others have already mentioned this: sometimes it's easier to articulate the integrity of your system to technical auditors with a hard separation like independent schemas. It's a little easier to show that a particular database account has access only to a particular schema than it is to convince him that your code does a sufficient job of enforcing the rules. I've done a lot of work where regulatory compliance is a concern, and this is always an easier sell.
The thing is that most vendors have solutions to these problems, and if Postgres doesn't have already built, you can probably build it from their primitives. This is just my general thinking under this particular set of circumstances.
Maybe I've misunderstood what "multi-tenant" means for the last 5 years, but I always thought of it as "multiple customers on a single installation/instance and using a single database" ... as in, every customer has some sort of account model other models are scoped to the account.
I'm sure there are valid reasons for having a unique db per customer (we do it in one product) but it seems easier to maintain/backup a single one.
I don't really see much of a security benefit to "one DB per customer". The actual data will be stored on the same disks, the queries issued to through the same DB processes. This seems like a huge amount of overhead for very little realized gain.
One of the nice things about enterprise software is that you can charge for things like this. If you have a large enough customer, even if you are hosting, you can simply forward to them your bill for a dedicated server.
Here's a Django Advent post, published while they were ramping up for the 1.2 release about row-level/object permissions:
The reason is probably a perceived improvement in security. I've worked in systems where they've done this for security reasons. I believe Joel Spolsky did it for this reason when he was betaing a release of product (FogBugz?).
It certainly does jack up the maintenance overhead. I wouldn't design a system this way myself.
I'm really curious if this is in fact a common practice out there in the wide world.
A bit of automation would solve this problem,as far as I can see it.
"I'm really curious if this is in fact a common practice out there in the wide world."
It is. I've worked for companies which do this, and have scaled with Oracle for several tens of thousands of customers (=databases).
Also, complex reports that require hand-coded SQL becomes that much simpler. (Compared to having all customers' data in a single database)
Or am I missing something?
There is also the customer perception thing, as petethomas said in this thread.
This has worked very well so far, it's very fast, scalable and has given us zero problems. Since all the tables are the same, I imagine it should work very well for the poster, with an appropriate index, too...
For a lot of B2B web applications aimed at serving "bigger" companies, customer adoption can still hinge on scotch-sipping and rounds of golf more than well-placed sign-up buttons or great viral marketing campaigns. It's been argued of late that maybe this part of the industry is becoming more "web-like", but in any case it's not very web-like yet, when it comes to sealing deals.
Many times the companies who were convinced by your talented sales team to buy your product eventually demand that the product be tailored to their (sometimes very company-specific) needs. If you want to keep the customer happy, or maybe just keep the customer, you have to keep customizing.
One could make the argument that the most talented sales teams are the ones that sign deals with customers who are willing to "stay on core", but over time as your customers' own businesses evolve, the challenge grows for everyone involved.
it's the ability to serve business customers who are
willing to pay for heavy customizations and the extra
support those entail