

Which database should my startup use? - chazwozz

We are looking at doing a startup website that will hopefully receive lots of traffic.<p>I am interested to know what databases people use for their startups. <p>Obviously MySQL or other open source ones are the cheapest options, but would these hold up a serious site like Digg, Facebook, etc? Or would it  be better starting off with something like Oracle?

======
ratsbane
The database is only one of a list of important architectural choices. Are you
going to use a *nix (Linux, BSD, Solaris...) or Windows? What programming
languages? Perl, Python, Java, Visual Basic... Applesoft Basic? Are you going
to use a framework or code everything from scratch?

These answers are all important, but it is possible to do a good job or a bad
job with any of those languages. Ultimately the choice of platform, languages,
database, etc. are not as important as how you put the pieces together.

You might also what other successful sites are using, particularly those with
data structures similar to what you are planning. I believe Facebook and
Wikipedia both use PHP and MySQL. FlightAware (which does a really super job
with lots of real-time data) uses Postgres.

While it is, I'm sure, possible to design a very effective site with almost
any combination but Windows/.Net/MSSQL seems like a particularly poor choice.
There are only a few big data-driven sites using that technology and most of
them aren't technically very good. Myspace is one example - (I think they use
MSSQL) - but they have horrible response times and frequent hiccups.
Ancestry.com is another big .Net site but they have a lot of problems. Pages
frequently hang while loading, response not quick at all and their UI is very
awkward. Several of the big airline sites run .Net but they aren't very good
either. Big companies whose primary business isn't IT seem to go with .Net a
lot and it shows in the quality of their sites - but that's not so bad - it
creates a lot of opportunities for people like us. While I'm sure it is
POSSIBLE to do good work on the Microsoft platform it seems very, very
unlikely.

My personal choice after having worked, at one time or another, with all of
the options discussed here is very clearly LAMP - Linux, Apache, MySQL,
Perl|PHP|Python but I wouldn't object to substituting BSD, Lighttpd or
Postgres.

As someone around here said recently, "...anyone proposing to run Windows on
servers should be prepared to explain what they know about servers that
Google, Yahoo, and Amazon don't."

~~~
stuki
I believe Netflix is built on .net. They might not be Google, Yahoo or Amazon,
but damn... .

~~~
felipe
I think Netflix is built in Java (at least part of it), because:

1) When you log in, you will get redirected to a .jsp page

2) Netcraft identifies Netflix server as Apache Coyote, which is the Tomcat
connector (Tomcat is Apache's Java-based web server)

<http://uptime.netcraft.com/up/graph?site=www.netflix.com>

~~~
stuki
It sure does look like you're right. I could have sworn I saw .asp pages there
when I was a member, but that was probably 4-5 years ago. One less reason to
use .net, I guess.

------
staunch
Oracle shouldn't even be on the table. If you get popular at all they will try
to rip you off and likely succeed. Not to mention it's a beastly system to
manage and you'll never use most of its unique features.

My preference is multiple systems. MySQL for most stuff and Postgres for
reporting and financial data. It's _easier_ to scale MySQL because so many
people have done it and it's simple. Postgres has lots of nice higher end
database features for reporting and I feel (a bit) safer having it handle
money.

When the purchase cost for both is $0 and managing them is easy it makes sense
not to force yourself into making a decision about which camp you're in. More
important than which you choose is learning to leverage memory well. Memcached
being the easiest scalable way to do that.

------
jamiequint
don't you work for oracle??

<http://www.blogger.com/profile/01908257636257717349>

~~~
ratsbane
Good grief.

I feel cheap and used, like a $3 whore.

Or a $3 piece of carpet. Or anything else that only costs $3.

~~~
ratsbane
... $3 which would buy about 5/8 gallon of jet fuel for Larry Ellison's
Gulfstream V, which, if I had wanted to help pay for, I would have used Oracle
on something.

------
pg
One viable option is: none. Desktop apps don't automatically have to include a
database. Why should web apps?

~~~
chazwozz
how would you store the users data?

~~~
pg
The same way desktop apps do: write it to disk.

~~~
mojuba
The filesystem doesn't handle concurrency and atomicity of some operations
well, while in web apps it is absolutely necessary - you serve multiple users
concurrently and you will inevitably have some shared data.

I tried to avoid databases in my web projects in the past and ended up having
quite complicated concurrency infrastructures, based on shared memory and the
filesystem. I can't say I was happy with the complexity I got.

Databases, no matter how ugly, clumsy, handle it nicely. In a few cases I made
a complete switch to DB and had my code reduced significantly.

So, for me this is still an open question.

------
walesmd
Go with MySQL.

Right now you have no startup, you should go with the option that will allow
you to launch the quickest and cheapest. Chances are - you won't see 1/10th
the traffic digg or facebook do.

No trying to be mean - I'm being realistic. You can worry about scaling when
the time comes - but it probably won't.

~~~
chazwozz
Don't worry, you haven't upset me :) Lets play in fantasy land and say it does
get that kind of traffic. I want to know if something like
MySQL/Postgres/SQlite is up to the task? Sometimes you gotta spend money to
make money, and I wouldn't want to head down a cheaper path if it would
compromise the site and delivering a quality service to the people in any way.

~~~
SwellJoe
"I want to know if something like MySQL/Postgres/SQlite is up to the task?"

You haven't defined the task, other than to say it is a web application.

At least one of those databases is up to the task, if the task is, in fact, an
application that requires a database. Some of the largest sites in the world
use one, two, or all three of those databases in one way or another. The best
designed of them probably use them appropriately, and thus may have a use for
both SQLite and one of MySQL or PostgreSQL.

------
morselsrule
I would go with MySQL simply because so many big sites have used it: Facebook,
Youtube, Digg, Yahoo, etc. If it's good enough for them, and free, it doesn't
seem like you can go wrong.

See a list of companies using MySQL: [http://www.mysql.com/news-and-
events/press-release/release_2...](http://www.mysql.com/news-and-events/press-
release/release_2006_34.html)

Or a talk from the guys at YouTube about how they scaled:
<http://video.google.com/videoplay?docid=-6304964351441328559>

The only exception would be is if you are primarily handling financial
transactions. Then perhaps Oracle would be the way to go, since your customers
will be more conservative about the technologies they would accept.

------
patrickg-zill
Out of the gate, I would recommend Postgres.

However, choosing which database to use should come after you establish a few
of the parameters that matter to you.

There is little reason to use a for-pay database unless you are doing
something special with the features. Oracle for instance has some interesting
text-search stuff builtin to the DB. But if you know you need that, then you
are already starting to narrow the list of candidates.

------
stuki
Since no one else seems to be doing so, let me put in a plug for MSSQL. If
you're using the Microsoft stack, it ought to be the default choice. The
integration with the rest of the environment (LINQ, db process LCR, Visual
Studio integration, async transactional queuing, one heck of a molap engine, a
great etl tool and more) is just too good to give up. And it's a darned nice
database to boot, with 2008 on the way being even nicer. Also, Microsoft has
publicly assured customers they will continue to license per socket, not per
core. While still more expensive than the free db's (duh), at least you're not
stuck handing over all future Moore's law gains to Larry Ellison.

~~~
mpc
Hand over those gains to Steve Ballmer instead?

I use .net and mssql at work and will admit that the tooling around ms
products is superb. However, I really don't know what we pay to deploy on it.
A lot more than a mysql deployment I am sure.

~~~
stuki
The point I was trying to make is that since processors now almost exclusively
gain performance by adding cores rather than by speeding up each individual
core, a per socket license lets the customer continue to keep the gains from
processor evolution.

With Oracle's per core license, if you swap out your dual cores for quads, you
owe twice (or thereabouts) as much in license fees, essentially guaranteeing
your (some processor performance metric) / (license fee) ratio no longer
increases over time.

~~~
mpc
OK, I see your point now.

What case can you make for choosing a commercial product over MySQL when it's
just as good? (but lacks tooling)

~~~
stuki
Tooling and integration are definitely major points. I guess any time you get
the whole app stack from a single vendor; you're likely to get a more
integrated product (os/400, db2, RPG and green screens; ...maybe not.....:)),
while at the same time forfeiting opportunity to pick the individual
components that suits you the best. You're obviously more dependent on
developments at your vendor as well. So if you believe MS is dying, you should
probably choose another database. Considering Word's dictionary includes RPG,
yet neither Postgres nor MySQL, that belief might not even be that
farfetched:).

Now, for what I am trying to do, easy access to a proven distributed
transaction infrastructure is non negotiable. MS provides this. So does
Oracle, and certainly IBM. The Java world provides this in spades as well, but
I'm not sure how nicely the popular open source db's play along. Last time I
looked at Postgres, this seemed at best a medium priority work in progress.
Since it doesn't take too many systems left in disagreement about the status
of a million dollar transfer to annoy the heck out of some people, in this
case I'd rather stick with one of the known entities.

For somewhat similar reasons, I prefer MSSQL's ability to replicate in
synchronous mode over the asynchronous / possibly lossy replication that MySQL
ships with (.net client auto failover is a nice added bonus). I believe there
are add on products for MySQL and Postgres to accomplish the same, but that
they are neither free nor used, developed and tested as widely as MS' version.

In addition, MSSQL comes with non rdbms features that Postgres/MySQL lacks.
Some of these may have Unix/Linux/open source, or at least Java,
competitors/equivalents, but since I intend to use several of them, combined
they add a lot of value to Sql Server. For example, a pet peeve of mine is
that the root of at least some evil is doing synchronously that which could be
done asynchronously (obviously not db replication:)). Since my app is
financial enough to 'require' strong transactional support, Service Broker
asynchronous transactional message queues alone might have been enough to sell
me on Sql Server.

I'm also assuming many end users will be Excel jockeys. Hence, giving them
access to slice and dice 'their' data in Analysis Services cubes (still trying
to wrap my head around the security implications of this one), and integrate
said cubes into their own worksheets, instead of just providing canned
reports, looks advantageous. Less work on my part as well:).

And, although I have written my share of database transfer and replication
scripts in Perl and Python, the latest SSIS seems a better tool for that
purpose. I'm assuming I need to generate reports involving data from multiple
outside systems.

Damn, this ended up being a long post. Hopefully some will find it valuable.
If nothing else, the pills I popped are finally winning the war against my
Saturday 'morning' hangover:).

------
rzwitserloot
postgres is the most flexible option. e.g. the google hibernate shards team
mostly works off of postgres, postgres is built with proper transactional
support right from the ground up, instead of a table option as with MySQL, and
it's more open source than MySQL.

An article I wrote on postgres' scalability before, including some juicy links
with shiny graphs:

[http://www.zwitserloot.com/2006/12/02/database-land-
postgres...](http://www.zwitserloot.com/2006/12/02/database-land-postgres-
scales-nicely/)

And as you seem to work for oracle, my 'endorsement' of the quality of that
particular little blight on this world should make for something nice to
report back to Larry.

------
phuego
PostgresQL. It scales very well in my opinion, although I have yet to get to
Face Book sized numbers :-)

I second all the 'has real features and is more reliable' comments that
precede this one; I add to those my number one favourite. I can develop on
Win32 and deploy on Linux and apply backups, schema changes, what have you
from one to the other without any pain at all. I started with MySQL but simply
could not achieve the above. YMMV naturally.

------
mkull
Oracle and MSSQL are right off the table.

Personally I prefer PostgreSQL, but if you are a database newb I would
recommend MySQL. (I really don't mean that in a bad way, MySQL is easier to
use as a beginner, and it has proven it can scale)

------
dood
If you're considering sqlite, you should read 'when to use sqlite'
[<http://www.sqlite.org/whentouse.html>]. Also see this thread 'Thoughts on
using SQLite?' [<http://news.ycombinator.com/item?id=38287>].

------
SwellJoe
Are you sure you need a database? Maybe "none" is the right answer.

If it isn't, then maybe SQLite, PostgreSQL, or MySQL. At least, that's the
order of preference for me, but I avoid databases when possible.

Google uses MySQL heavily, so it obviously scales (but I'm given to understand
they had to work hard to do it).

------
natrius
Don't listen to the people recommending SQLite. Use SQLite when you're
developing or when you want a database for a desktop application. Don't use it
for a production web application.

------
Keios
I would recommend PostGreSQL and you probably won't need to switch to Oracle
if you use PostGres properly. Importantly its free and has a usable and
friendlier license than MySQL.

I am working on a startup idea which will also have high traffic and I am
personally using Schevo - which is a non-sql Object DBMS for python based on
Durus. I like Schevo because SQL sucks and using Schevo is like you a
programming language - in this case its Python. You can check out the Schevo
project at www.schevo.org. This would work for you only if you are using
Python (highly recommend). Schevo does not have very good documentation yet,
but it has excellent support on the IRC and Google Groups from its creators -
at least I have found them to give complete and quick responses.Hope this
helps.

------
codeslinger
You should not use any, if you can get away with it. Start with Hadoop
(<http://lucene.apache.org/hadoop/>) and see if you can wrap your head around
that first before tying yourself to the RDBMS lifestyle.

------
jamongkad
For my startup I use MySQL although these days I'm checking out alternative
DBs. But based on my research it seems MySQL gets the most support from
hackers(correct me if I'm wrong). In terms of big companies that use it I do
know Flickr uses MySQL.

~~~
rms
I've seen studies showing PostgreSQL scales better but I assume it's the kind
of thing where you can manipulate the data to get whatever results you want. I
know MySQL is better supported but can anyone prove it is faster than
PostgreSQL?

~~~
jamongkad
I see, oh please forgive me if I look like I'm link jacking but I was
wondering if we can touch bases on your talks on UI design? it's been a topic
of great interest to me.

~~~
rms
no problem, I send you an invite to chat in gmail

------
ctdean
MySQL and PostgreSQL are fine relational databases and you can build large
systems out of them.

You might also consider embedded DBs such as SQLite (relational) or qdbm (non
relational). If it were me, I'd go with one of these.

Good luck!

------
davidw
Depends - if you just need data storage that's fast, and don't care about
having 100% data integrity, Mysql seems popular. I wouldn't trust people's
money to it, though.

I've long preferred PostgreSQL, because it's always had those things that make
a relationanl DB a relational DB - foreign keys, lots of consistency checks,
things like that. I have to admit, though, that as of Mysql 5, with InnoDB, it
has started to resemble a "real database".

Of course, for some applications, data integrity really doesn't matter that
much, and you'd prefer speed, so Mysql with its native db type might be
faster.

------
mpc
If you're doing a start up then the whole point is to launch fast. Why would
you want to choose a technology that could get in the way with licensing costs
and complicated bloat?

Pick something that is free and good.

------
dfranke
Start with PostgreSQL. Switch to Oracle if you need to and can afford it.

~~~
benhoyt
Just wondering what advantages Oracle has over Postgres? (I'm skeptical, but I
honestly don't know.)

We chose Postgres over MySQL by a very fine margin, but we're pleased we did.
Seems to have fewer "try to make your life easier but end up doing the
opposite" quirks.

~~~
dfranke
The optimizer is smarter and it's better at concurrency control. You'll get
better performance out of Oracle when you have very-long-running transactions
or lots of transactions at once operating on the same tables. The former isn't
a concern for most web applications, but the latter might be.

The above is received wisdom from academics, though, so it may be dated.
PostgreSQL has improved a lot in recent years.

------
bootload
_"... Obviously MySQL or other open source ones are the cheapest options ..."_

do you need a database?

------
run4yourlives
This is the honest answer: It doesn't matter.

(Assuming you're not considering MS Access)

------
aitoehigie
in my opinion, i will suggest mysql, its free, opensource, the list goes
on....... i will like to ask if there are any african hackers who visit
yombinator.com? if not then i guess that i am the only one?

