Hacker News new | past | comments | ask | show | jobs | submit login
Which database should my startup use?
17 points by chazwozz on Aug 10, 2007 | hide | past | web | favorite | 53 comments
We are looking at doing a startup website that will hopefully receive lots of traffic.

I am interested to know what databases people use for their startups.

Obviously MySQL or other open source ones are the cheapest options, but would these hold up a serious site like Digg, Facebook, etc? Or would it be better starting off with something like Oracle?




The database is only one of a list of important architectural choices. Are you going to use a *nix (Linux, BSD, Solaris...) or Windows? What programming languages? Perl, Python, Java, Visual Basic... Applesoft Basic? Are you going to use a framework or code everything from scratch?

These answers are all important, but it is possible to do a good job or a bad job with any of those languages. Ultimately the choice of platform, languages, database, etc. are not as important as how you put the pieces together.

You might also what other successful sites are using, particularly those with data structures similar to what you are planning. I believe Facebook and Wikipedia both use PHP and MySQL. FlightAware (which does a really super job with lots of real-time data) uses Postgres.

While it is, I'm sure, possible to design a very effective site with almost any combination but Windows/.Net/MSSQL seems like a particularly poor choice. There are only a few big data-driven sites using that technology and most of them aren't technically very good. Myspace is one example - (I think they use MSSQL) - but they have horrible response times and frequent hiccups. Ancestry.com is another big .Net site but they have a lot of problems. Pages frequently hang while loading, response not quick at all and their UI is very awkward. Several of the big airline sites run .Net but they aren't very good either. Big companies whose primary business isn't IT seem to go with .Net a lot and it shows in the quality of their sites - but that's not so bad - it creates a lot of opportunities for people like us. While I'm sure it is POSSIBLE to do good work on the Microsoft platform it seems very, very unlikely.

My personal choice after having worked, at one time or another, with all of the options discussed here is very clearly LAMP - Linux, Apache, MySQL, Perl|PHP|Python but I wouldn't object to substituting BSD, Lighttpd or Postgres.

As someone around here said recently, "...anyone proposing to run Windows on servers should be prepared to explain what they know about servers that Google, Yahoo, and Amazon don't."


I worked in a few corporates and they all seem to favour .Net (for obvious MS support/marketing/mgt related reasons) but most of the solutions seemed to be okay.

Also I notice a trend that if you want to sell to corporates/companies they prefer .Net solutions as it ties in better with what they use (most business still uses MS, like it or not) and are more reluctant to go with opensource based solutions.

What do you think?


I believe Netflix is built on .net. They might not be Google, Yahoo or Amazon, but damn... .


I think Netflix is built in Java (at least part of it), because:

1) When you log in, you will get redirected to a .jsp page

2) Netcraft identifies Netflix server as Apache Coyote, which is the Tomcat connector (Tomcat is Apache's Java-based web server)

http://uptime.netcraft.com/up/graph?site=www.netflix.com


It sure does look like you're right. I could have sworn I saw .asp pages there when I was a member, but that was probably 4-5 years ago. One less reason to use .net, I guess.


Oracle shouldn't even be on the table. If you get popular at all they will try to rip you off and likely succeed. Not to mention it's a beastly system to manage and you'll never use most of its unique features.

My preference is multiple systems. MySQL for most stuff and Postgres for reporting and financial data. It's easier to scale MySQL because so many people have done it and it's simple. Postgres has lots of nice higher end database features for reporting and I feel (a bit) safer having it handle money.

When the purchase cost for both is $0 and managing them is easy it makes sense not to force yourself into making a decision about which camp you're in. More important than which you choose is learning to leverage memory well. Memcached being the easiest scalable way to do that.


One viable option is: none. Desktop apps don't automatically have to include a database. Why should web apps?


how would you store the users data?


you could just remember it yourself! creating the ultimate live web 2.0, where once a user enters a username and password, you actually have to verify it at that exact moment or else they can't login! then you could hire midgets and have them remember and rebuild profiles for people and unicorns and...

anyways, i thought most (simple) desktop apps store data in text files? (perhaps encrypted?) it's definitely possible to store data in text files for web apps but seems inefficient for many reasons - speed and scalability being the first that come to mind. i've stored data in text files before for very simple apps that contained a couple of fields and had no sensitive information.

my personal suggestion: go with MySQL. it's free, well documented, efficient, scalable, etc. the list goes on...


That is his obtuse way of saying you don't have to use an RDBMS. You will still use some sort of database of course (Filesystem, BDB ... etc).


The same way desktop apps do: write it to disk.


The filesystem doesn't handle concurrency and atomicity of some operations well, while in web apps it is absolutely necessary - you serve multiple users concurrently and you will inevitably have some shared data.

I tried to avoid databases in my web projects in the past and ended up having quite complicated concurrency infrastructures, based on shared memory and the filesystem. I can't say I was happy with the complexity I got.

Databases, no matter how ugly, clumsy, handle it nicely. In a few cases I made a complete switch to DB and had my code reduced significantly.

So, for me this is still an open question.



Hah. Good catch. I guess it explains why he thinks "Sometimes you gotta spend money to make money..." makes sense with web app databases. A more honest way of putting that would be "Sometimes you gotta spend money for me to make money charging you to support black box bloatware..."

Astroturfed.


Good grief.

I feel cheap and used, like a $3 whore.

Or a $3 piece of carpet. Or anything else that only costs $3.


... $3 which would buy about 5/8 gallon of jet fuel for Larry Ellison's Gulfstream V, which, if I had wanted to help pay for, I would have used Oracle on something.


appreciate the comment earlier. some of the more constructive feedback


..after the heart attack inducing story about GDrive, I'm now close to having another one in laughter from this posting...


haha, yeah i do, i'm a middleware guy though. The question is nothing to do with the company i work for.. honest! :)


Go with MySQL.

Right now you have no startup, you should go with the option that will allow you to launch the quickest and cheapest. Chances are - you won't see 1/10th the traffic digg or facebook do.

No trying to be mean - I'm being realistic. You can worry about scaling when the time comes - but it probably won't.


Don't worry, you haven't upset me :) Lets play in fantasy land and say it does get that kind of traffic. I want to know if something like MySQL/Postgres/SQlite is up to the task? Sometimes you gotta spend money to make money, and I wouldn't want to head down a cheaper path if it would compromise the site and delivering a quality service to the people in any way.


"I want to know if something like MySQL/Postgres/SQlite is up to the task?"

You haven't defined the task, other than to say it is a web application.

At least one of those databases is up to the task, if the task is, in fact, an application that requires a database. Some of the largest sites in the world use one, two, or all three of those databases in one way or another. The best designed of them probably use them appropriately, and thus may have a use for both SQLite and one of MySQL or PostgreSQL.


Yes, Digg uses MySQL and other optimizations.

http://www.computerworld.com/action/article.do?command=viewA...


yes, they are.


I would go with MySQL simply because so many big sites have used it: Facebook, Youtube, Digg, Yahoo, etc. If it's good enough for them, and free, it doesn't seem like you can go wrong.

See a list of companies using MySQL: http://www.mysql.com/news-and-events/press-release/release_2...

Or a talk from the guys at YouTube about how they scaled: http://video.google.com/videoplay?docid=-6304964351441328559

The only exception would be is if you are primarily handling financial transactions. Then perhaps Oracle would be the way to go, since your customers will be more conservative about the technologies they would accept.


Out of the gate, I would recommend Postgres.

However, choosing which database to use should come after you establish a few of the parameters that matter to you.

There is little reason to use a for-pay database unless you are doing something special with the features. Oracle for instance has some interesting text-search stuff builtin to the DB. But if you know you need that, then you are already starting to narrow the list of candidates.


Since no one else seems to be doing so, let me put in a plug for MSSQL. If you're using the Microsoft stack, it ought to be the default choice. The integration with the rest of the environment (LINQ, db process LCR, Visual Studio integration, async transactional queuing, one heck of a molap engine, a great etl tool and more) is just too good to give up. And it's a darned nice database to boot, with 2008 on the way being even nicer. Also, Microsoft has publicly assured customers they will continue to license per socket, not per core. While still more expensive than the free db's (duh), at least you're not stuck handing over all future Moore's law gains to Larry Ellison.


Hand over those gains to Steve Ballmer instead?

I use .net and mssql at work and will admit that the tooling around ms products is superb. However, I really don't know what we pay to deploy on it. A lot more than a mysql deployment I am sure.


The point I was trying to make is that since processors now almost exclusively gain performance by adding cores rather than by speeding up each individual core, a per socket license lets the customer continue to keep the gains from processor evolution.

With Oracle's per core license, if you swap out your dual cores for quads, you owe twice (or thereabouts) as much in license fees, essentially guaranteeing your (some processor performance metric) / (license fee) ratio no longer increases over time.


OK, I see your point now.

What case can you make for choosing a commercial product over MySQL when it's just as good? (but lacks tooling)


Tooling and integration are definitely major points. I guess any time you get the whole app stack from a single vendor; you're likely to get a more integrated product (os/400, db2, RPG and green screens; ...maybe not.....:)), while at the same time forfeiting opportunity to pick the individual components that suits you the best. You're obviously more dependent on developments at your vendor as well. So if you believe MS is dying, you should probably choose another database. Considering Word's dictionary includes RPG, yet neither Postgres nor MySQL, that belief might not even be that farfetched:).

Now, for what I am trying to do, easy access to a proven distributed transaction infrastructure is non negotiable. MS provides this. So does Oracle, and certainly IBM. The Java world provides this in spades as well, but I'm not sure how nicely the popular open source db's play along. Last time I looked at Postgres, this seemed at best a medium priority work in progress. Since it doesn't take too many systems left in disagreement about the status of a million dollar transfer to annoy the heck out of some people, in this case I'd rather stick with one of the known entities.

For somewhat similar reasons, I prefer MSSQL's ability to replicate in synchronous mode over the asynchronous / possibly lossy replication that MySQL ships with (.net client auto failover is a nice added bonus). I believe there are add on products for MySQL and Postgres to accomplish the same, but that they are neither free nor used, developed and tested as widely as MS' version.

In addition, MSSQL comes with non rdbms features that Postgres/MySQL lacks. Some of these may have Unix/Linux/open source, or at least Java, competitors/equivalents, but since I intend to use several of them, combined they add a lot of value to Sql Server. For example, a pet peeve of mine is that the root of at least some evil is doing synchronously that which could be done asynchronously (obviously not db replication:)). Since my app is financial enough to 'require' strong transactional support, Service Broker asynchronous transactional message queues alone might have been enough to sell me on Sql Server.

I'm also assuming many end users will be Excel jockeys. Hence, giving them access to slice and dice 'their' data in Analysis Services cubes (still trying to wrap my head around the security implications of this one), and integrate said cubes into their own worksheets, instead of just providing canned reports, looks advantageous. Less work on my part as well:).

And, although I have written my share of database transfer and replication scripts in Perl and Python, the latest SSIS seems a better tool for that purpose. I'm assuming I need to generate reports involving data from multiple outside systems.

Damn, this ended up being a long post. Hopefully some will find it valuable. If nothing else, the pills I popped are finally winning the war against my Saturday 'morning' hangover:).


PostgresQL. It scales very well in my opinion, although I have yet to get to Face Book sized numbers :-)

I second all the 'has real features and is more reliable' comments that precede this one; I add to those my number one favourite. I can develop on Win32 and deploy on Linux and apply backups, schema changes, what have you from one to the other without any pain at all. I started with MySQL but simply could not achieve the above. YMMV naturally.


postgres is the most flexible option. e.g. the google hibernate shards team mostly works off of postgres, postgres is built with proper transactional support right from the ground up, instead of a table option as with MySQL, and it's more open source than MySQL.

An article I wrote on postgres' scalability before, including some juicy links with shiny graphs:

http://www.zwitserloot.com/2006/12/02/database-land-postgres...

And as you seem to work for oracle, my 'endorsement' of the quality of that particular little blight on this world should make for something nice to report back to Larry.


Oracle and MSSQL are right off the table.

Personally I prefer PostgreSQL, but if you are a database newb I would recommend MySQL. (I really don't mean that in a bad way, MySQL is easier to use as a beginner, and it has proven it can scale)


If you're considering sqlite, you should read 'when to use sqlite' [http://www.sqlite.org/whentouse.html]. Also see this thread 'Thoughts on using SQLite?' [http://news.ycombinator.com/item?id=38287].


Are you sure you need a database? Maybe "none" is the right answer.

If it isn't, then maybe SQLite, PostgreSQL, or MySQL. At least, that's the order of preference for me, but I avoid databases when possible.

Google uses MySQL heavily, so it obviously scales (but I'm given to understand they had to work hard to do it).


Don't listen to the people recommending SQLite. Use SQLite when you're developing or when you want a database for a desktop application. Don't use it for a production web application.


I would recommend PostGreSQL and you probably won't need to switch to Oracle if you use PostGres properly. Importantly its free and has a usable and friendlier license than MySQL.

I am working on a startup idea which will also have high traffic and I am personally using Schevo - which is a non-sql Object DBMS for python based on Durus. I like Schevo because SQL sucks and using Schevo is like you a programming language - in this case its Python. You can check out the Schevo project at www.schevo.org. This would work for you only if you are using Python (highly recommend). Schevo does not have very good documentation yet, but it has excellent support on the IRC and Google Groups from its creators - at least I have found them to give complete and quick responses.Hope this helps.


You should not use any, if you can get away with it. Start with Hadoop (http://lucene.apache.org/hadoop/) and see if you can wrap your head around that first before tying yourself to the RDBMS lifestyle.


For my startup I use MySQL although these days I'm checking out alternative DBs. But based on my research it seems MySQL gets the most support from hackers(correct me if I'm wrong). In terms of big companies that use it I do know Flickr uses MySQL.


I've seen studies showing PostgreSQL scales better but I assume it's the kind of thing where you can manipulate the data to get whatever results you want. I know MySQL is better supported but can anyone prove it is faster than PostgreSQL?


I see, oh please forgive me if I look like I'm link jacking but I was wondering if we can touch bases on your talks on UI design? it's been a topic of great interest to me.


no problem, I send you an invite to chat in gmail


Depends - if you just need data storage that's fast, and don't care about having 100% data integrity, Mysql seems popular. I wouldn't trust people's money to it, though.

I've long preferred PostgreSQL, because it's always had those things that make a relationanl DB a relational DB - foreign keys, lots of consistency checks, things like that. I have to admit, though, that as of Mysql 5, with InnoDB, it has started to resemble a "real database".

Of course, for some applications, data integrity really doesn't matter that much, and you'd prefer speed, so Mysql with its native db type might be faster.


MySQL and PostgreSQL are fine relational databases and you can build large systems out of them.

You might also consider embedded DBs such as SQLite (relational) or qdbm (non relational). If it were me, I'd go with one of these.

Good luck!


Start with PostgreSQL. Switch to Oracle if you need to and can afford it.


And if you want nearly everything Oracle has to offer at a fraction of the price, consider EnterpriseDB.


Just wondering what advantages Oracle has over Postgres? (I'm skeptical, but I honestly don't know.)

We chose Postgres over MySQL by a very fine margin, but we're pleased we did. Seems to have fewer "try to make your life easier but end up doing the opposite" quirks.


The optimizer is smarter and it's better at concurrency control. You'll get better performance out of Oracle when you have very-long-running transactions or lots of transactions at once operating on the same tables. The former isn't a concern for most web applications, but the latter might be.

The above is received wisdom from academics, though, so it may be dated. PostgreSQL has improved a lot in recent years.


If you're doing a start up then the whole point is to launch fast. Why would you want to choose a technology that could get in the way with licensing costs and complicated bloat?

Pick something that is free and good.


"... Obviously MySQL or other open source ones are the cheapest options ..."

do you need a database?


This is the honest answer: It doesn't matter.

(Assuming you're not considering MS Access)


in my opinion, i will suggest mysql, its free, opensource, the list goes on....... i will like to ask if there are any african hackers who visit yombinator.com? if not then i guess that i am the only one?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: