
Django and SaaS - neeleshs
http://tidbids.posterous.com/saas-with-django-and-postgresql
======
ibejoeb
No! Schemas!

I don't usually go on tirades like this, but this is what I do for a living.

You'll want to use schemas vs. databases. Not only is it significantly more
convenient (i.e., you'll administer 1 database instead of n), it will let you
scale. For instance, a postgres database has a pool of shared memory that it
uses for query plans, caching, etc., and you'll be able to optimize the use of
it across all of your clients, rather than one by one.

Related is the impact on connection resources. A connection to a database may
be used for multiple schemas, so you'll be able to make much better use of
those. Suppose you have 1000 clients, which each typically have 1-2 users on
your application when it's typically in use. In the multiple database config,
you'll need one connection pool per database, each pool having at least 1 open
connection, so that under no load you're still maintaining 1000 connections.
With the schemas approach, you can maximize the throughput and balance the
connections proportionally to your _active users_ rather than the number of
customers you have.

Oh, one more biggie: remember the shared buffers? Those are typically per-
session (connection). So if you frequently run "select price from widgets
where category=?", you'll want to parse, rewrite, plan, compile, and cache
that as infrequently as possible, so keep those connections and sessions open
as long as possible so you get the benefit of prepared statements. You _will_
see load go down if you do this, from reduced CPU for parsing and plan
generation, and reduced system wait since you won't be churning your shared
buffers as much.

~~~
phugoid
Could you please clarify what you mean by "schemas"?

My understanding is that a schema is an abstract definition of what your
database fields/relations are. So that's too vague for me.

Adding a customer_id field to every single table in the database would
technically fit the definition of "schema", but I suspect you mean something
else altogether. Please elaborate.

~~~
ibejoeb
Neeleshs' reply is what I meant. In Postgres, it's like a namespace.

But you have a point: adding customer_id is not unreasonable. In fact, this is
my preferred logical approach, but I'll typically only do this on Oracle. It's
DRY, meaning that, in the abstract definition of schema, you're modeling your
relations once. If you have widgets belonging to customers, you have a one
widgets relation containing a customer id, rather than one widgets relation in
each customer schema. It's an elegant design, and it takes even greater
advantage of the benefits I described earlier.

I'd advocate against it when building a multitenant SaaS application on
Postgres if the following hold: 1\. You have many, many customers 2\. Each
customer has a lot of data

The reasons:

1\. You'll quickly hit a place where it's hard to throw more vertical
processing power at it. You'll need a serious SAN with lots of spindles to be
able to service many request to a lot of data, and plenty of CPU and RAM to
handle the grunt work. Let's face it: you're on Postgres because you're broke
:) I really love it, but sometimes I'd rather buy a solution to a problem
(e.g., VPD). However, if you're rolling out schemas, you can build a new
database and move them discretely. This isn't automatic or transparent, but
it's very clean and very easy.

2\. You're likely dealing with clients with special needs. It might not be
appropriate to roll out new features to every customer all at the same time.
If you ALTER TABLE widgets... it's all or nothing. Sometimes that's great, but
sometimes it's not, and in building multitenant SaaS apps, I find it's most
often not. In repeating your design, you're buying the flexibility to deal
with your customers on a more granular level.

3\. Others have already mentioned this: sometimes it's easier to articulate
the integrity of your system to technical auditors with a hard separation like
independent schemas. It's a little easier to show that a particular database
account has access only to a particular schema than it is to convince him that
your code does a sufficient job of enforcing the rules. I've done a lot of
work where regulatory compliance is a concern, and this is always an easier
sell.

The thing is that most vendors have solutions to these problems, and if
Postgres doesn't have already built, you can probably build it from their
primitives. This is just my general thinking under this particular set of
circumstances.

~~~
daredevildave
Is there a solution like schemas for MySQL?

~~~
pilif
no, but you can achieve the nearly same thing using multiple databases in
MySQL. Databases in MySQL are like schemas in PostgreSQL

------
callmeed
_"As in a typical multi-tenant architecture, one database per customer."_

Maybe I've misunderstood what "multi-tenant" means for the last 5 years, but I
always thought of it as "multiple customers on a single installation/instance
and using a single database" ... as in, every customer has some sort of
account model other models are scoped to the account.

I'm sure there are valid reasons for having a unique db per customer (we do it
in one product) but it seems easier to maintain/backup a single one.

~~~
snprbob86
If your data is easily and completely partitioned by customer, it may make
sense to do so. It's as easy a partitioning scheme you can come up with. It
gives you the power to do staged roll outs and other flexibility in
deployment, as well as security isolation that makes your life easier and some
customers may prefer.

------
idan
FWIW, I'm no expert but...

I don't really see much of a security benefit to "one DB per customer". The
actual data will be stored on the same disks, the queries issued to through
the same DB processes. This seems like a huge amount of overhead for very
little realized gain.

~~~
edanm
snprbob86's point from another comment makes sense: it gives you more control.
For example, doing staged rollouts of different features for different
customers, etc. I'm not saying it's worth the headaches, but there _are_
advantages.

~~~
snprbob86
Also consider behind the firewall installs or premium dedicated hosting.

One of the nice things about enterprise software is that you can charge for
things like this. If you have a large enough customer, even if you are
hosting, you can simply forward to them your bill for a dedicated server.

------
ezmobius
This approach will not scale. In fact vie seen it personally with multiple of
my hostingnclients fall over after even 5000 databases. There has to be a file
per table with innodb usually and this approach will work until younhave a few
thousand customersmand then you will feel the pain of refactoring this intoms
true multi tenant db system

~~~
jseifer
Have you seen similar problems with Postgres? That's what the author is using.

------
chrismsnz
Seems like row-level permissions seems to be a better paradigm to make this
work. Last I checked, Django had some support for it - recently improved in
their 1.2 release.

Here's a Django Advent post, published while they were ramping up for the 1.2
release about row-level/object permissions:

<http://djangoadvent.com/1.2/object-permissions/>

------
niels
I'm confused... why would they ever come up with a solution like this?

~~~
mattchew
I assume by "like this" you mean a separate db for each customer.

The reason is probably a perceived improvement in security. I've worked in
systems where they've done this for security reasons. I believe Joel Spolsky
did it for this reason when he was betaing a release of product (FogBugz?).

It certainly does jack up the maintenance overhead. I wouldn't design a system
this way myself.

I'm really curious if this is in fact a common practice out there in the wide
world.

~~~
neeleshs
"It certainly does jack up the maintenance overhead."

A bit of automation would solve this problem,as far as I can see it.

"I'm really curious if this is in fact a common practice out there in the wide
world."

It is. I've worked for companies which do this, and have scaled with Oracle
for several tens of thousands of customers (=databases).

Also, complex reports that require hand-coded SQL becomes that much simpler.
(Compared to having all customers' data in a single database)

------
StavrosK
This solution seems overcomplicated. It might work fine, I've never tried it,
but I can't imagine databases (even postgres) being optimized for such things.
The way we did it at <http://historio.us> was to just have a foreign key to
each user (this can be easily extended to groups of users, i.e. companies) and
just performed a lookup of the related fields depending on the domain.

This has worked very well so far, it's very fast, scalable and has given us
zero problems. Since all the tables are the same, I imagine it should work
very well for the poster, with an appropriate index, too...

------
petethomas
The main advantage of one-database-per-business-customer isn't isolating
customer data, it's the ability to serve business customers who are willing to
pay for heavy customizations and the extra support those entail, while not
polluting your core offering through the introduction of attributes, features,
etc. for which other customers will have no use.

For a lot of B2B web applications aimed at serving "bigger" companies,
customer adoption can still hinge on scotch-sipping and rounds of golf more
than well-placed sign-up buttons or great viral marketing campaigns. It's been
argued of late that maybe this part of the industry is becoming more "web-
like"[1], but in any case it's not very web-like yet, when it comes to sealing
deals.

Many times the companies who were convinced by your talented sales team to buy
your product eventually demand that the product be tailored to their
(sometimes very company-specific) needs. If you want to keep the customer
happy, or maybe just keep the customer, you have to keep customizing.

One could make the argument that the most talented sales teams are the ones
that sign deals with customers who are willing to "stay on core", but over
time as your customers' own businesses evolve, the challenge grows for
everyone involved.

[1] [http://techcrunch.com/2010/07/25/enterprise-software-is-
sexy...](http://techcrunch.com/2010/07/25/enterprise-software-is-sexy-again/)

~~~
danieldon

        it's the ability to serve business customers who are 
        willing to pay for heavy customizations and the extra
        support those entail
    

You'd need to have flags in the app itself, which would likely turn into an
unmaintainable monstrosity. If you have a handful of large clients, are
heavily customizing the app for each one, are using a modern framework like
Django or Rails and a DVCS, the obvious maintainable approach is to have full
individual installs, forks for each install and a modular application that
pushes the customizations into pluggable components as much as possible.

~~~
pilif
Having a shared core and a lot of hooks for customization is the road we have
taken for our application. I've written a lengthy post about it just last
friday:

<http://news.ycombinator.com/item?id=1557218>

------
danieldon
The 1 DB/customer approach is usually a bad idea. Strict data security
requirements are one of the only situations this approach makes sense compared
to the traditional single DB approaches or individual installs.

------
neeleshs
Based on ibejoeb's comments, there is an updated entry at
<http://tidbids.posterous.com/> which uses one schema per user.

Thanks ibejoeb!

