Hacker News new | past | comments | ask | show | jobs | submit login
Why doesn't Quora use PostgreSQL (quora.com)
52 points by zende on July 26, 2010 | hide | past | web | favorite | 76 comments

This is a common mistake people make, I think. Just because an ecosystem is smaller does not mean it is inferior (or less practical). In fact, the opposite is routinely true.

Is it harder to find tutorials for PostgreSQL? Yeah, maybe. Is it harder to get high-quality advice and resources from subject matter experts? No; in fact, it's far easier. I have used PostgreSQL for many years and in all of them every question I've had has been answered quickly and properly by somebody in #postgres who knows way more about postgres and SQL than I'd ever even want to know.

And what about hires? When a million people "know" MySQL it takes a lot more effort to find somebody who actually does. There may be fewer contractors for PostgreSQL but they are practically by definition better than the average MySQL contractor.

unlike learning a programming language, with databases, the domain knowledge is harder to discover and more risky if you don't understand it

Exactly. So should you trust the ecosystem that relies on lazy, bandwagon developers for the majority of its market share, or the one that has been silently doing shit the right way for years? There may be a whole lot of really skilled MySQL admins out there (I guess there must be given how many huge sites go to extreme lengths to make MySQL work for them), but I can find you a handful of grossly overqualified PostgreSQL admins in no time. The same can't be said for the former.

Seconding #postgres as awesomely smart and helpful. Especially RhodiumToad, who always seems to be in there.

Oh man, RhodiumToad on #postgres totally saved my ass a couple of times like 2 years ago. Good times.

I see this #whatever notation a lot. What does it refer to?

irc chat, generally on the 'freenode' irc server - http://freenode.net/

Wow, really? Chat? In 2010??? That would never have occurred to me. Thanks!

IRC is still heavily used in the open source world. It's a great resource and you should add it to your arsenal of 'places to get support on FLOSS stuff'

Should we be doing video chat in 2010? People used telephones 30 years ago, and it's still popular now :-)

I'm on irc most days of the week. It's far from dead.

#whatever in the context of a 'channel' means IRC (Internet Relay Chat), generally on the Freenode.net network.

Agreed. It's like saying they should have written Quora using C# because there are more developers for it instead of a "fringe" language like Python.

My guess is that a small ecosystem is bad for pretty much everything other than databases:

- Programming languages have libraries

- Frameworks have plugins

- Operating systems have compatibility issues

The best code is the code you don't have to write.

There are a lot of "positive network externalities" in the computer world: the more people use something, the more valuable it is. That's true for databases as well.

However, what he's saying is that beyond a certain point, those effects tend to become a bit less important. Java is more popular than Python, for instance, but you can get pretty much anything done that you need to in Python. OTOH, a language like Erlang (sorry to pick on it, it's just an example) is still fringe enough that there are gaps in its libraries, it's tough to find people, and so on.

Echoing your point, having a library that can do something is fantastic. Having two is even better when you have a choice on which one feels more comfortable to use and is more suited for your project. Having 25 libraries that do basically the same thing gives diminishing returns because the community is split over which one is better and easier to use, while the majority won't be full featured and with a larger number of bugs overall.

This is not necessarily a black and white scenario that holds true to everything, however. There is no such thing as an absolute generic.

That's really interesting to know, actually. I've only been with companies that had MySQL installations (Facebook and Twitter) and so jumping in was really easy, but on the other hand I do like to do things right. A few questions, though:

0/ How would I be able to tell if the limitations of our DB are because of MySQL and would be solved by PostgreSQL? That is, common pain points in MySQL that we wouldn't recognize immediately until we switched? (Edit: mostly looking for performance-related, but anything is fine.)

1/ Is there a PostgreSQL consulting firm that has the same reputation that Percona does with MySQL?

2/ Is there a good place to start when there's an issue or question I have with PostgreSQL (aside from IRC)? With MySQL I've been either going to the docs, or to Google, but it's been spotty with either.

0: Empact lists a few here: http://news.ycombinator.com/item?id=1547192

1: There are a number of highly trusted PostgreSQL consulting firms; the last time my employer needed a consultant, we hired one from Fujitsu's dedicated PostgreSQL team.

2: So far, I've only needed the (deliciously comprehensive) PostgreSQL online docs and (sometimes) Google; most Google searches wind up on the official user/developer mailing-lists, so that would be another good place to ask.

http://www.postgresql.org/support/professional_support is good place to look for consulting firms.

Both http://www.pgexperts.com/ and http://www.enterprisedb.com/ are run by core PostgreSQL developers.

for the consulting firm question http://www.enterprisedb.com/ and http://www.commandprompt.com/ are large contributors to postgres and have great mindshare in the community

re 2: The mailing lists http://www.postgresql.org/community/lists/ are a good place to ask questions. Core devs are frequently answering questions there.

Maybe the postgres supporters can fill this in about Postgres reliability:


that's one of the biggest issues scaring people away from Postgres.

The replication is reliable, it's just that with the 8.0 releases, it's an add-on and not obvious to setup (e.g with Slony have to do special procedures to properly replicate DDL changes, it required having triggers on replicated tables, etc). In 9.0 replication is built in, transparent, and trivial to setup.

Sorry, but pre-9.0, Postgres replication is neither reliable nor easy to setup/admin. It's absolutely the worst thing about (pre-9.0) Postgres, and I'm glad the core team finally caved and made it a first-class feature.

Slony caused us no end of problems (large postgres installations at Last.fm). It's a monumental pain in the ass, it will choke on something, probably when you're trying to make a DDL change, sometimes just at random. Docs/help on trying to figure out what went wrong and how to fix the replication are not easy to find. After you managed to 'fix' it, you will be unsure if the slave is truly in sync. The only way to be confident is to nuke the slave and start over to ensure you have a consistent replica.

That said, I'd still gladly suffer the trials and tribulations of crappy trigger based replication than use MySQL.

I'm looking forward to trying replication in 9.0, it sounds fantastic.

You are correct about Slony but if you do not care about doing any reads at the slave in pre-9.0 you had log shipping and warm standby. Warm standby is perfect for using the slave for failover in case the master crashes or the HDDs break.

Warm standby is very reliable and easy to admin. The setup is not that easy but not too hard either.

nicely articulated, couldn't agree more

Ok. Being borderline trollish here, but if I read this article, this means to me "yeah. we are using a technically inferior solution and thus risking data loss and increasing the workload on our developers because our HR department gets more resumes if we search for an in-house database specialist".

Postgres usually does its thing and does it well. You usually don't get weirdness that requires you to even hire a specialist. And if you do, the consultants around are great and actually understand the problem.

As other people here have said: If you find someone listing PostgreSQL experience on their resume, you know they have the experience. If you find somebody listing MySQL, you'd have to check whether they are just listing it because they have heard of it or whether they really understand the problem domain.

So. DB specialists are settled. What about developers? PostgreSQL and MySQL are close enough in what SQL constructs they support (no wonder - MySQL provides a subset of SQL anyways), so coming from MySQL to PostgreSQL usually is trivial and the few things that are unsupported unix_timestamp() for example, you can write wrappers for if you really need them.

Of course, developers will at first find it strange that they can't, for example insert a 200 characters string into a varchar(10) column, but even the most braindead developer sooner or later will understand that silently truncating data usually is a bad thing.

No. The arguments listed in that article are just fear of the unknown and total non-issues (in german, there's this saying: "Was der Bauer nicht kennt, das frisst er nicht" - ever so true).

I'm glad I can work with PostgreSQL day-in, day-out. I'm glad I migrated that huge application from MySQL to Postgres back in 2000. I'm glad I did not touch MySQL for serious stuff ever since then.

I understand if you need to use MySQL if you want to run some software that runs only on MySQL. But if you are writing your own application, there's just no reason not to got with the real RDBMS.


  I'm glad I migrated that huge application from MySQL to
  Postgres back in 2000. I'm glad I did not touch MySQL for
  serious stuff ever since then.
You should refrain from commenting on MySQL then. It's amazing (and sad) how many pro-PG-anti-MySQL folks don't even know basic things about MySQL but feel entitled to rant and call it "toy DB", "not real RDBMS", etc.

note the emphasis on "serious stuff".

I did in fact touch MySQL since then in the process of setting up various open source applications that require MySQL.

While back then, I needed Postgres for some features MySQL didn't have but otherwise didn't really dislike MySQL, over those 10 years, I really learned to hate MySQL:

- try (erroneously) adding or inserting latin1-data into a UTF-8 table. You'd expect an error message, but instead your data gets truncated at the first byte with the high bit set. No warning. This means that you have to re-read and compare whatever data you stored to be sure that you didn't lose any.

- once, mysqldump stopped dumping after coming across a corrupt table. It did so while still exiting with a 0 exit code. That meant that for months my backups were incomplete.

- "alter table sometable add somcolumn integer not null" will gladly add the column and place null values in each row. Null values that should absolutely not be there. No error. No warning.

- Same goes for date columns, but there it's even better as it'll set the date to 0000-00-00 which is invalid if you try to add this in an insert or update.

- I said "corrupt table" earlier. Yeah. That happens with MySQL, whereas it never ever (with the exception of faulty hardware of course) happens with any other RDBMS - including PostgreSQL.

Now, in my opinion, a database that doesn't guarantee that the data you put in can then be read back out in unaltered form (or at all) is, really, no real RDBMS.


(edit: reason for not posting this in my original reply: I'm basically repeating above points over and over here on HN and I thought people would know my gripes by now)

So basically you learned to hate MySQL while not doing anything serious with it. Interesting.

  - try (erroneously) adding or inserting latin1-data into a UTF-8
  table. You'd expect an error message, but instead your data gets
  truncated at the first byte with the high bit set. 

  - "alter table sometable add somcolumn integer not null"
  will gladly add the column and place null values in each
  row. Null values that should absolutely not be there. No
  error. No warning.

  Same goes for date columns, but there it's even better as it'll
  set the date to 0000-00-00 which is invalid if you try to add this
  in an insert or update.
That's why I said those who don't know MySQL like to complain — hardly any of doing the rants have heard about sql modes:

  mysql> alter table t1 add column (dob date not null);
  ERROR 1292 (22007): Incorrect date value: '0000-00-00' for
  column 'dob' at row 1

  - I said "corrupt table" earlier. Yeah. That happens with
  MySQL, whereas it never ever (with the exception of faulty
  hardware of  course) happens with any other RDBMS - including
Funny. Googling with right keywords paints a bit different picture. Must be some other PG, I guess.

          - try (erroneously) adding or inserting latin1-data into a UTF-8
          table. You'd expect an error message, but instead your data gets
          truncated at the first byte with the high bit set. 


        - "alter table sometable add somcolumn integer not null"
        will gladly add the column and place null values in each
        row. Null values that should absolutely not be there. No
        error. No warning.

ok. it inserts them as 0 - that's in improvement from last I checked. Still. It silently does stuff. It should not let me do it in the first place.

Same goes for the date. And please don't tell me the error message you get back there is quite correct. That alter table statement tries to silently add a default and later fails due to the default not being valid.

Also, see this:

    mysql> alter table foo add dob date not null;
    Query OK, 0 rows affected (0.00 sec)
    Records: 0  Duplicates: 0  Warnings: 0

    mysql> describe foo;
    | Field    | Type        | Null | Key | Default | Extra          |
    | id       | int(11)     | NO   | PRI | NULL    | auto_increment | 
    | value    | varchar(10) | NO   |     |         |                | 
    | intvalue | int(11)     | NO   |     | NULL    |                | 
    | dob      | date        | NO   |     | NULL    |                | 
    4 rows in set (0.01 sec)

this table now contains two columns that can't be null and yet the default is null. Thanks.

Concerning corrupted tables. I have around 1.5 TB of data spread over ~10 databases and over the last 10 years, I never had a corrupt table that I could not trace back to hardware failure (harddrives, raid controllers, RAM).

Over the same period, I had around 200 MB of data spread over ~10 databases in MySQL and I had three cases of corrupt tables that I could not trace back to defective hardware as various diagnostic tools yielded no error and the machines still run without problems with PostgreSQL.

Maybe I'm just unlucky.

Maybe you just don't know what you are doing.

Maybe you just have nothing to say and don't recognize that such behavirs are clearly wrong, that failing silently for a database is a case of bad design, and MySQL is just technically inferior to Postgres. Or are we missing something?

Yes you are. The first thing you are missing that such behaviors are configurable: you can choose whether to fail silently or get an error. The second thing you are missing is the reason such behavior exist in the first place — namely, there are usage scenarios when it is preferable. Having ability to choose which trade-offs to have is in no way technical inferiority.

Never in my entire experience as a software developer have I seen a case where a database should fail silently. It's bullshit and you know it.

A few bugs/limitations of Mysql not shared by Postgres. All AFAIK.

* Integer columns with null constraints will silently translate nulls to 0s, rather than throwing an error (as constraints in all other contexts do). Imagine a database which, on violation of a foreign key constraint, just silently picked the first row in the target table - it's just insane.

* Transactions can't roll back schema changes, so if you have a Rails migration which fails on the application of a unique index in the midst of other operations, too bad. Your DB is stuck in the no-mans-land between migrations.

* Constraints are anemic relative to postgres: no integer range or string length constraints, no whitelist, blacklist, regex match, &c. &c.

There are certainly others I've run into, but the above indicate to me that the Postgres developers are working with/delivering a better code-base.

> Transactions can't roll back schema changes

That's a feature of postgres rather than a limitation of mysql however. As far as I know the SQL Standard doesn't ask for transactional DDL, Oracle has pretty much no support for transactional DDL (in 10g anyway, each DDL statement is executed in its own transaction and any pending transaction is COMMITted before a DDL), I have no idea about SQL Server.

(and don't get me wrong, it's a totally awesome feature and it should be praised to high heaven, but it's an additional feature nonetheless, rather than a limitation of the concurrence)

MSSQL supports it because it was inherited from Sybase. SQLite also supports it as well. You're right that it's a feature, but frankly I can't live without it -- there's no way to write safe schema migrations!

Is quora flooding their own site with self-posed questions trying to get publicity and rankings? The only time I've ever heard of it is from the oxygen of publicity HN seems to give it.

Nah. You can usually (but not always) tell who asks the question by checking the first person who's following the question, or by checking the question history by clicking on the timestamp.

It's interesting to see how PostgresSQL didn't get nowhere near the popularity of MySQL. I think a few major things contributed to this:

1) MySQL focused on performance first while PostgresSQL positioned themselves as "feature-complete."

Of course, that was years ago and in the meantime they became comparable in both aspects. MySQL+InnoDB is ACID compliant and Postgres matches the InnoDB performance.

Sure, Postgres might still have the upper hand in the variety of features but MySQL has MyISAM. If you don't need transactions you can get a significant performance boost over Postgres. Speed IS a feature and one could argue it's more important to many than all the nice extra things Postgres has.

2) MySQL was compatible with mSQL so they got a pretty good user base right from the start. mSQL later became obsolete.

3) Inertia. Even if Postgres would be net superior to MySQL (think feature complete but faster than MyISAM) it would still take a lot of time to catch up, if ever. There are tons of tools already developed for MySQL and a huge knowledge base around it. Just like C++, Windows, or any other big technology, there's a lot of inertia to it.

I think having a "launch" at the right time might have had some impact also. MySQL launched in 1995, right around the time that the idea of generating hypertext dynamically was taking off, so a bunch of web stuff was looking for a lightweight, fast DB without necessarily a lot of features. Postgres had been around since the 1980s as a research project, and in 1994 was publicly released as an open-source project and added SQL support (rather than its own query language), but it was sort of off the internet's radar as a research project mostly known in academia. The official launch of the rebranded PostgreSQL was in 1997, but that might've been a bit late already, since 1995-1997 were a fairly crucial two years for the LAMP stack's birth.

Most people don’t realize how bad postgress was at that time.

I tried postgress in 1995-96. I was tiered of mSQL [M-SQL not MySQL] and really wanted a free SQL server to work at backend of our web apps. Downloaded, compiled and installed what was then called postgress95 on our linux server. And then I could connect to it from localhost using the command prompt. But the instant I connected with it from a remote machine over TCP/IP the server crashed! I really wanted to make it work, couldn't. Note that Internet was still very new and support availability wasn't as wide as today.

Discovered and started using mySQL which was then way better than mSQL, And still using it after 15 years.

Postgres was also a bitch to install and work with in the early years, even on really supported platforms.

The usability push started with the performance one, from 7.x onwards (2000) and especially with 8.x (2005).

Most important of all: PHP.

I'd say most important is LAMP and that unfortunately Postgres starts with the letter P.

MySQL being easier to install on Windows from 1995 to 2005 might also have had something to do with it. PostreSQL had to be run in Cygwin until release 8.0 in 2005.

Most of MySQL's momentum I'd say would be because Slashdot used it. Because of course, a database that is (only) suitable for a scenario where reads outnumber writes a few thousand to one, and a comment getting lost is actually no big deal is obviously suitable for your OLTP app...

error 500. maybe its unrelated to them not using postgresql.

here's the correct link: http://www.quora.com/Why-doesnt-Quora-use-PostgreSQL

HN must have stripped out the ' from the link when I submitted it

grrrrr. apparently HN always strips single-quotes from links. There should be single-quote in the link in doesn't.

Here's a bit.ly link :-( http://bit.ly/9iQWn8

You could always URL encode it: http://www.quora.com/Why-doesn%27t-Quora-use-PostgreSQL

That way everyone clicking on the link from the home page doesn't get taken to the 500 page.

Argh, that is really annoying though, and our servers really should be able to figure out that the two links, apostrophe or not, should lead to the same question. I'm a bit late to the game; was the original link up top unchanged?

Edit: should be fixed now.

There are a number of companies that provide support for PostgreSQL. Enterprise DB is probably the most prominent of them. Kind of funny that they didn't have a good technical reason not to use PostgreSQL, just that the community was smaller.

Postgres 9, with streaming replication and easy to setup bullet proof read only slave replication will probably move momentum in the direction of PostgreSQL. The replication was always easier to setup on MySQL, which gave it an advantage.

This article is an example of choosing easy option over the better option. From a startup's perspective, it's completely understandable. Thousands of cheap MySQL admins, or a few hundred expensive PostgreSQL admins?

As for the "nobody uses PostgreSQL" argument; Facebook and Twitter aren't an endorsement of MySQL because any Linux hosting provider out there will have MySQL installed BY DEFAULT. PostgreSQL would require you to upgrade to a more expensive account (for root access) and install it yourself. As a result, all the hobbyists learn on the free and always available MySQL when building "The Next Big Thing." Remember, Facebook was started in a college dorm room.

Excellent lesson to other startups: When two or more products do what you need, choose the one you can more easily hire help for.

(not that there aren't a fair number of postgres experts, but there's no question there are far more competent mysql experts)

No question, really?

Those who commented in favor of postgres: What are some of the biggest deployments of postgres? Facebook has a gargantic deployment of MySQL and it just works. The author of the answer on Quora mentions reliable scalability as the most important factor for their MySQL choice on another answer here: http://bit.ly/dm6HtQ. What is the biggest system that postgres is deployed in?

From http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyt... :

Yahoo builds two petabyte PostgreSQL database James Hamilton writes about Yahoo's "over 2 petabyte repository of user click stream and context data with an update rate for 24 billion events per day".

It apparently is built on top of a modified version of PostgreSQL and runs on about 1k machines. In his post, James speculates on the details of the internals. Very interesting.

A better question would be: who uses MySQL as anything but a glorified key-value store when reaching large sizes? I personally have been involved in trying to use MySQL as a "real database" in the 10-100TB range and let me tell you, it's not pretty. I'm not sure about the open source PostgreSQL, but I know Greenplum has petabyte level warehouses running on a distributed version of it.

A few observations:

1) MyISAM's performance is highly dependent on certain idiosyncrasies of a lot of applications. Using MyISAM in this day and age is a very bad idea. InnoDB at least gets closer to real database behavior.

2) The "query optimizer" is insulting at best and actively impeding getting things right if you use it for much more than simple queries. Something that's more along the lines of what really large databases (as opposed to KV stores) get used for can implode the server.

Personally, I think too many people try and stick things in relational databases that don't belong there simply because they've got the hammer in their hand and it's easier than pulling out a screwdriver.

who uses MySQL as anything but a glorified key-value store when reaching large sizes?

My assumption is both Quora and Facebook use MySQL this way. While you are right that this is not using it as a real database, I want to know if PostgreSQL is deployed in a similar setup at all. Most of the people (including you) don't take into account the fact that there are many cases where MySQL (used as a KV store) proved to work, while I have never heard of such huge PostgreSQL deployments. If this was a general discussion regarding MySQL and PostgreSQL I could understand that, however, I think the post is more about whether to choose MySQL or PostgreSQL if you are going to use it as a KV store.

The introduction of hstore actually lets you do this natively. I think it's partially that PostgreSQL people tend not to try and use the hammer as a screw driver, but maybe that's just me.

Or maybe sometimes MySQL is a screw driver being used as a hammer?

Some examples:

Skype uses PostgreSQL for their VOIP services.

Yahoo! has a customized PostgeSQL for their data warehousing storing a couple of petabytes of data.

I believe IMDB is another quite prominent user.

Tineye.com, for those who have heard of it, also uses PostgreSQL (I set it up). It is mostly a KV store with a metadata join, but it's +1.5 billion rows returning +10 random uncached rows in <100ms.

Originally we used MySQL, which revealed its true face at ~500 million rows. Queries that were <300ms suddenly turned into 2-3 minutes because the query planner decided it would be fun to do a full scan or somesuch. Migrating to PostgreSQL not only reduced the query time of the same queries by more than half but also allowed us to scale the number of rows pretty much linearly with a small constant.

Then I did some horizontal partitioning and things got really awesome.

That's why database abstraction is so nice. If I ever get sick of MySQL - or say, decided I didn't want to use Postgres - it's not hard to switch!

Database abstraction isn't, or at least isn't practical.

Small projects don't need an interchangeable database. Bigger projects won't let you get away without making assumptions about the database. Nothing but the basics work the same between SQLite and ORACLE.

SQLite... Oracle? never used either one, but I believe it.

I'm talking about MySQL vs. Postgres.

Depends on your abstraction system. If we're talking about SQL, the incompatibilities abound. And ORM's or other libraries tend to be either very, very complex or tend to implement everything in the most simple way. Both not very good for performance. Never mind that most frameworks don't have the same level of support for every DB (open source libraries tend to have very limited Oracle/SQL Server support)

And this is only talking about queries, once we get to triggers, constraints or even stored procedures…

I'm thinking on a fairly medium scale, I guess, and mainly just about PG/MySQL. (You mean other databases exist?? Never even seen such a thing!).

It's very helpful to know what's going on behind the scenes of that pretty little database abstraction, though.

If I ever have to care, hopefully I'll be hiring some guy with a beard to think about that for me.

there's one useful thing about postgresql:

specifically, interviewees that inform you you should be using postgresql can be immediately rejected.

"do you use mysql or postgresql?" "oh, mys-" "YOU SHOULD SWITCH TO POSTGRESQL!!"

I think this can be extended to any interviewee who starts redesigning your stack before they're even hired can be rejected.

I don't know. Personally, I'd prefer an interviewee that comes to the interview with an open mind and actually tries to make suggestions based on the view of the infrastructure that they have acquired based on the information I have given them or they inferred in their research of my company.

Of course they might be wrong because naturally, they can't have the full picture. But I'd much rather have an employee I can hold a discussion with and get input from then one that blindly does what I tell them to. In the end, I might as well have made a bad decision or overlooked something.

In fact, having an engaging discussion during the interview is quite a sure-fire way to get hired if you are in an interview with me (provided the stuff you talk about makes sense).

Hmm. Probably, but there are people out there who are qualified to know how to scale better than you, presumably.

On the other hand, people who are religious about technology choices (faith rather than logic) tend to be bad matches. People who don't revisit their decisions don't make great engineers. I don't want choices made because of a previous lifestyle decision.

Totally true. Anybody who is a fanboy of a certain technology is not a good hire. It's always good to have passionate people, but passionate about an idea, not a tool. The bottom line is that in most cases, it doesn't even matter, just as long as things are getting done. If the people you work with don't appreciate that, you've got deeper problems.

I love the fact that this got downvoted to zero. More religion.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact