

Postgresql vs mongodb - ibotty
http://blog.pingoured.fr/index.php?post/2012/05/20/PostgreSQL-vs-MongoDB

======
moe
If you insist on using a general purpose database instead of a dedicated
search engine (SOLR, sphinx) then you should at least read the manual;
<http://www.postgresql.org/docs/9.1/static/textsearch.html>

~~~
llimllib
(Which existed in 8.4 as well:
<http://www.postgresql.org/docs/8.4/static/textsearch.html> . Not sure why
he's not using version 9)

------
antirez
The first law of DB comparisons says: in order to compare two database systems
a deep understanding of both systems is required, however the more expert an
user is the less likely is to compare two systems (he will just pick what
seems more suitable for the task) thus almost all the comparisons attempts are
garbage.

~~~
55pts
Antirez, I have a great deal of respect for your work, but calling somebody
else's work "garbage" is not cool.

Constructive criticism in order to make the benchmark more relevant would've
been more helpful.

~~~
antirez
Hi 55pts, don't take me wrong I understand what you mean, but it is like to
say that because I happen to contribute to some software that is popular I
don't have the right to express freely my thoughts.

garbage here means: not very useful. More politically correct people are going
to use a different word, but the concept is the same. DBs comparison /
benchmarking is hard and I read many blog posts about people that apparently
don't even have a solid understanding / experience with one of the systems
(sometimes with both the compared ones), so this articles are really not going
to help but to confuse other users.

~~~
gbog
Garbage means "shit", it never meant "not very useful".

You being Denis Ritchie's reincarnation or John Doe changes nothing to that,
and to the fact that you have been disrespectful of someone else's work.

Personal opinion: on HN you have the right to be disrespectful of a work that
is of mediocre quality (which is the case) if you provide the community with
more details on why it is bad and how to improve, and, more importantly, if
you do not deny being disrespectful just two posts below by changing on the
fly the meaning of a word.

~~~
antirez
That's your opinion about how HN should look like, mine is different, more
like that:

1) This kind of puritanism you express is not part of the hacking culture at
all.

2) To compare two database systems that are the _hard_ work of a lot of
people, without a solid understanding, is the real problem here, because it
means to spread shit to one or the other, and also to have an effect into
potential users trying to understand what to use.

So I think that the only that should be worrying about their behavior is
people posting incorrect stuff.

But because I trust in freedom I don't want to prevent them from posting with
some kind of pre-post filter. Nor you should try to prevent me to say this
kind of blog posts are often garbage.

------
masklinn
> MongoDB being a NoSQL database using dictionary has a much more flexible
> structure.

Not really, Postgres has hstore[0] which provides query-able schemaless
key:value storage.

Furthermore, testing is done with 8.4, current version of Postgres is 9.2 and
has further performance improvements over 8.4 (and more flexibility for
"unsafe" data storage in the form of unlogged tables).

Finally, no note on the configuration of Postgres and whether it's been tuned
correctly, only leaving the inference that he's using the default
configuration... which isn't exactly performant.

[0] <http://www.postgresql.org/docs/9.0/static/hstore.html>

~~~
lobster_johnson
He was using the default config, says so in the comments section.

He also says in the comments that he did not know you could index case-
insensitively in Postgres.

This article is completely worthless.

~~~
fdr
Not completely worthless: it does say something about the ease of getting good
performance without having to know much about the ecosystem. But it doesn't
speak to even the first "easy" 80% of what one would do to optimize
performance, though.

I appreciate this work with numbers, and the code to run it. I'm sure it can
be adapted to take on these two suggestions.

~~~
bsg75
> it does say something about the ease of getting good performance without
> having to know much about the ecosystem

The root cause of most "I used [MongoDB,PostgreSQL,MySQL,CouchDB,...) and my
site fell over" articles.

~~~
lobster_johnson
"Getting good performance without having to know much about the ecosystem" is
pointless, because choosing a tool is all about learning how to wield it. It's
true about any tool, even something like a hammer.

Software is complicated stuff and not something to treat superficially. Like
everything in life, optimum results require a deep understanding.

This article is worthless because the results are not useful to anyone for
anything. Doing serious benchmarks requires a particular and fairly rare set
of skills.

~~~
fdr
> "Getting good performance without having to know much about the ecosystem"
> is pointless, because choosing a tool is all about learning how to wield it.
> It's true about any tool, even something like a hammer.

There is no complete understanding of any non-trivial system, even for its
authors. There is a question of how much one can understand in a reasonable
amount of time, and what fraction of a perceived ideal one can achieve. Some
systems have steeper curves than others.

> This article is worthless because the results are not useful to anyone for
> anything. Doing serious benchmarks requires a particular and fairly rare set
> of skills.

It is not worthless because there is some code posted to run it. Someone could
beef up the corpus (it's pretty small right now) and apply their tweaks (or
try other data types like hstore, or other indexing strategies). This person
tried his/her best and put something together.

I agree the benchmark is pretty flawed, but given the thankless (and often
adversarial!) nature of running benchmarks and publishing results, I'm
grateful for anyone who does one and then publishes the code.

------
lysol
_@charlie, I did not change anything to the configuration of my PostgreSQL, so
I use the defaults provided by RHEL._

He didn't even tune his database server. That automatically invalidates this
comparison.

------
tferris
As much as I like Mongo and comparisons to other DBs I doubt that your
comparison does really help anyone:

=> I found it hard to get your message by just skimming through the blog post:
the first results lack measuring units, is it request per second or total
request? The graphs were even more confusing—they do not give the reader
quickly the relevant information and the conclusion isn't clear.

=> I missed information if you set up respective indexes for querying Mongo
(correct me if I am wrong). And I don't know if you used the new hstore
feature on Postgres.

=> This is still a micro benchmark with limited expressiveness. The overall
use case, data model and queries are directd rather towards the NoSQL and
general full text search capabilities. Take a use case with lots of
joins/links, random reads and writes and we will have a totally different
picture (thus, your title is misleading).

So if one wants to make a comparison between Mongo and Postgres I would like
to see how Postgres' hstore and the new JSON datatype feel in terms of daily
usage and performance compared to Mongo (because these two new features
compete directly with NoSQL DBs and the latter one with Mongo in particular).

------
zomby
Breaking news: an old version of Postgres used badly is slow.

~~~
ibotty
breaking news: even an old version of postgres used badly is not that slow.

------
gbog
> a NoSQL database has a much more flexible structure

In some cases you really have too much diversity of data structures and then
it a NoSQL solution can be helpful. An example can come to mind: a
notification engine with tens of diverse notification format, each with their
variable data and their localized display templates.

But if your data is well structured, like in this example, it is not true that
SQL is not flexible. I have used PostgreSQL to manipulate on the fly the
relatively big UNIHAN list (50 000 Chinese characters with their attributes)
and could rearrange data in the way I wished, creating, updating columns,
creating grouped tables, and I got a working database for an online dictionary
without any pain.

If I would have to run scripts for each SQL command I ran, it would have taken
much longer, and I would have the easy begin, modify, check, commit or
rollback loop.

I sometime wonder if enough people who use SQL use it through the command line
interface.

~~~
mickeyp
> In some cases you really have too much diversity of data structures and then
> it a NoSQL solution can be helpful. An example can come to mind: a
> notification engine with tens of diverse notification format, each with
> their variable data and their localized display templates.

Actually the relational database solution to that problem -- an arbitrary mix
of various data types -- is to use a table with a variety of columns of
different data types. For instance: Str1, Str2, Str3, Str4, Int1, Int2, Int3,
etc. -- for as many as you need to store what you want to store. Obviously
they should all be nullable and you would need a PK (and FK) to whatever it is
you are storing it with. Next, you build views on top of this for each
different type of data stored in this mix table, and in this view you then
name each column sensibly. Using triggers on the view (or without: Oracle can
figure out most inserts if the SELECT query is trivial) you can insert data
into the mix table as well.

~~~
gbog
You mean materialized view, right?

Anyway you just demonstrated my point: in this case NoSQL may be a more
flexible (simple) option.

Another relational solution is to use a single entity-attribute-value table
and drop the strong typing.

------
ibotty
of course that post is not best practice for a sql-database search and not
optimized at all. what it shows is that even then, on a single host, a sql
database is within limits to nosql databases.

------
buster
I must say, those boxcharts are a horrible way to represent the data.

~~~
adeelk
Why do you say that?

~~~
buster
don't know where to start..

    
    
      1. it's confusing (atleast for me)
      2. a lot of his charts don't start with 0, this leaves the impression that e.g. mongodb is 3 times faster then PG, when in reality the difference was like 5%
      3. one scollable page with multiple pictures would have been much better, imo
      4. About the page itself: i don't see how many times the benchmarks were run.
    

The conclusion shows that even the writer is mislead in interpreting the
charts... "PostgreSQL performs worse than MongoDB in the retrieval queries but
the time it takes is negligible (we are way under 1 second)" What does it
matter that it's one second?! That's stupid. His charts show for example for
get_email 0.001 seconds versus 0.005 seconds. That's 5 times faster! Saying
"that's so small, i don't care" sounds bad. Unless the author meant some other
charts...

Also, when i'm at it, i am not sure if this is a good benchmark at all. 0.001
seconds is an extremely small value to measure. I doubt it is usefull for
comparison. But it may be ok, if this is would be some "real-world" example
where queries don't take long. But the benchmark is not "real-world". At all.

I would have liked to see a more realistic one (the dataset is ok), with
multiple concurrent queries and also with measurement of CPU and memory
consumption..

~~~
sesqu
I wouldn't say it's confusing, but the inconsistent baselines are very
misleading. I'm also pretty surprised that some of the tests are anywhere near
comparable, since they had different indexes.

------
Luyt
Somewhat more general and with a humorous undertone:

<http://www.mongodb-is-web-scale.com/>

Not only the video, but a transcript too.

------
clu3
i think the dataset is way too small, only 166K rows? That's not really
interesting. if you want to compare, make it at least a few millions

~~~
salimmadjd
166k is kids play. At this amount we will not see any measurable advantages
among serious contenders. The key data is how these perform on multiple
servers, with sharding and master slave configurations, etc.

------
daemon13
Why didn't you use PG 9.1 / 9.2 with new kernel?

Also, why old Mongo?

Would love to see benchmarks of most recent stable.

~~~
ibotty
(not very informative, but..) i am not the one who did the blog post. i just
found it interesting (reasons stated somewhere else). so i cannot answer any
questions about it.

------
arthurprs
I know this a specific test, but some things to consider:

The test dataset is negligible for such a powerful machine.

Test ran with a recent MongDB and an outdated Postgres.

MongoDB uses all available memory, Postgress need to be manually tunned in the
config files.

~~~
taligent
What are you talking about ?

He is using an outdated version of MongoDB as well.

------
dr_rezzy
Why did mongodb drop 30 docs from its index? Did PG drop any? Does this not
matter?

------
thibaut_barrere
One thing that I've never seen compared so far is the RAM/disk consumption for
the same dataset.

Did anyone make a comparison of this kind, out of curiosity?

------
sasha-dv
What exactly is the author's conclusion? To ask more questions?

It seams to me that in a post named X-vs-Y you should arrive at some sort of
conclusion. Say, _X is the way to go_ or something like that.

I'm not a db geek and I'm not pro-NOSQL or pro-RDBMS. Having no horse in the
race, and a very modest knowledge of DBMS, I wonder what would be the best
choice in this specific use case (building mailman archive)?

------
jacques_chester
Doesn't MongoDB automatically index columns? Wouldn't that account for a lot
of the difference?

~~~
adeelk
No, it doesn’t.

<http://www.mongodb.org/display/DOCS/Indexes#Indexes-Basics>

