

MongoDB vs. Clustrix Benchmark - sergei
http://sergeitsar.blogspot.com/2011/01/mongodb-vs-clustrix-comparison-part-1.html

======
antirez
In this article MongoDB == NoSQL, that is not the case. Different NoSQL
solutions have different use cases. Also IMHO MongoDB is pretty SQLish in the
data model, so you are actually comparing two implementations of a similar
data model here, and one may be superior to the other one or the other way
around I guess. No surprise.

A more interesting attempt is IMHO to check how the difference in the data
model of some NoSQL solution can lead to very different performances.

For instance Clustrix VS Redis can be interesting. Examples:

1) A lot of writes against a table where you require then to get things
ordered by insertion time. With Redis is is just LPUSH + LRANGE. Try to do a
read/write test where many clients are writing and reading at the same time
(real world), against a table (or Redis list) with millions of elements.

2) Range queries when there are a lot of writes against this indexes. For
instance a table with a score (we are modeling an online game leaders board),
a lot of inserts of new scores. Get ranges between random intervals at the
same time. Again, many clients writing, many reading.

------
bryanmig
So a guy who is an expert in Clustrix (and knows how to setup, tune, etc)
compares it against some other technology that he does not know (and does not
know how to setup, tune, etc) and comes to the surprising realization that his
technology is better?

Where have I seen this before? Oh right.. every time I see "Technology A vs
Technology B" comparisons.

Naturally his results are in his favor, otherwise he would not have posted
them.

~~~
jbellis
In fairness, you don't have to tune MongDB poorly (deliberately or otherwise)
to get poor performance with a workload involving substantial numbers of
writes; it's well-documented that there is a global lock
([http://www.mongodb.org/display/DOCS/How+does+concurrency+wor...](http://www.mongodb.org/display/DOCS/How+does+concurrency+work))
that prevents reads during write operations.

That said, there are certainly nosql systems with better scaling and
concurrency stories* than mongodb out there that he could have benchmarked
against. :)

*I'm a cassandra committer

~~~
dolinsky
Just a point of clarification, but it is a per-server lock, not a global lock
across the whole database.

------
strlen
Pet peeve: eventual consistency isn't for scalability and performance, it's
for availability. In a well designed system, the whole debate only matters
during in a failure condition: a strongly consistent system gives up
availability upon a certain kind of failure, an eventually consistent system
gives up consistency upon a certain kind of failure.

There are strongly consistent scalable "NoSQL" systems e.g., BigTable.
Megastore even provides complex distributed cross row transactions.

In a well tuned system, loss of availability (in a failure scenario) could be
minimize to seconds. What this means for performance is that you have systems
which encounter second-long latency spikes upon failures.

It also isn't a binary switch:

1) Quorums can be used to achieve read-your-write consistency in the case of
simple failures (loss of 1 node out of 3).

2) There's multiple kinds of relaxed consistency models. One of them is
serializable consistency: you may get a stale read, but the the order of reads
is the same as the order of writes. This is used by PNUTs and can be achieved
by serializing the writes through a single master. This means that there's
(again, for a short time period) loss of _write_ availability in a failure,
but there's no loss of read availability.

3) Paxos/multi-Paxos can be used to achieve atomic writes (all available nodes
receive the write) while withstanding simple failures (similar to quorum
protocols... and I believe multi-Paxos uses quorum protocols under the cover
to improve liveness over "raw" Paxos). This is at the cost of higher latency
(and complexity). [Edit: In this case, you're dealing with full-blown strong
consistency, but with -- at the cost of latency -- the ability to tolerate
certain kinds of simple failures/trivial partitions]

Clustrix looks interesting, but it looks like it addresses the scalability and
performance issues with RDBMS, but not the availability feature. If an RDBMS
were to drop the "A" and "I" in ACID (C in "ACID" means serializable view of
the execution, which is not the same as the C in "CAP": the latter means all
nodes in a cluster agreeing upon what the data is which isn't required for the
former), it would be possible to build a highly available, low latency RDBMS;
but it would also not be as useful without atomic and isolated cross row
transactions.

[Disclaimer: I work on a Dynamo-style database, but fond of PNUTs as an
architecture (more difficult to implement, but IMO a better fit for plurality
of web applications) and generally fascinated and interested in distributed
systems, databases and systems programming in general]

~~~
jhugg
I like the content of this post, but I take issue with the idea that the
debate only matters in a failure condition. That's academically true, but
practically misleading. When developing my app, I have to assume any part of
the system could fail at any time. Thus my whole app needs to be written with
failure in mind.

In an ACID system, that means my transactions either happen or don't, and it's
on me to figure out whether they did or not. Sometimes I'll need to do some
investigating after a failure to find out what succeeded and what rolled back.

In an eventually consistent system, I need a contingency plan for every
operation on the system. Now that contingency plan might be very simple, such
as "Whatever happened, happened and last writer wins." The CALM conjecture
paper does a decent job of describing when this might be a workable
contingency plan. More complicated operations require increasingly complex
contingency plans. This is why Cassandra is now offering counters as an API
feature, because something as simple as counters is really hard to get right
in all failure scenarios.

If you fit the CALM description and require multi-data center availability,
then EC is probably a good bet. It may still be a good bet otherwise, but the
availability comes along with a much more difficult app development picture,
failures or no.

~~~
strlen
> This is why Cassandra is now offering counters as an API feature

This is specific to Cassandra, where the development team chose not to
implement optimistic locking via version vectors. That said the counters patch
essentially implements a vector of counters, very similar conceptually to a
vector clock.

That said, there are some scenarios where quorums (in absence of an agreed
upon view of the cluster e.g., a zookeeper-based failure detectors) _could_
mean concurrent vector clocks, which means it's up to the application to
reconcile it.

If your application can not reconcile a split brain scenario on its own _and_
depends on a total order then yes, it's not a good fit for EC.

As I've said, I also think that serializable consistency is easier for
applications to deal with (they can assume a total order) than eventual
consistency but provides the strong benefit of read availability (something,
quite frankly, most Internet applications require).

Most complex applications have points where consistency can be relaxed (either
somewhat or fully) _and_ points where consistency is needed.

In "we chose consistency" and "we chose availability" cases, the applications
will have to deal with the consequences of those choices: much as developers
may handwave the implications of an eventually consistent system (since it
returns right results most of the time, not thinking whether or not their
application is sensitive to a non-serializable order) they might also
implement ad-hoc, poorly thought out HA solutions on top of strong consistent
systems -- often times losing _both_ availability _and_ consistency.

The typical usage scenario of masking MySQL failures and latency by using
memcache comes to mind. VoltDB is very correct to remove the need to use
memcached for _read latency_ (a caching layer in front of a database which
already a cache which sits on top of a file system with a cache is a caching
layer too many), but in many cases I've seen memcached used to accept reads (
_and_ sometimes writes) while the underlying database is unavailable.

Finally, the whole debate about atomicity is quite meaningless if (like many
applications) you're fronting your data access layer with an RPC framework
(your RPC call may go through to a database and perform a write even it times
out to the application i.e., implying non-atomicity).

The CALM paper is a big move into the right direction: being able to determine
synchronization points at which ACID transactions (or weaker forms of
synchronizations if permissible e.g., quorums and version vectors) can be
used. Incidentally, I also think a declarative language (whether like
Bloom/Bud or closer to SQL) is a good way to express the constraints and let a
system (which may not be all that different from a query planner) determine
whether those points occur.

~~~
jhugg
Overall, another solid comment.

As for RPC, whether the client is informed of success or failure has nothing
to do with atomicity. The guarantee is that the transaction either happens
completely or not at all, and nothing more.

Agreed, it can be very frustrating that under certain failure scenarios, it's
unclear whether the transaction completed or rolled back. Still, given
atomicity, a correctly designed system can't be left in an inconsistent state.
At times, it's up to the application to discover (or re-discover) that state.

Systems that offer atomicity for single record operations can actually make
the same assurances, but may require complicated escrow systems and
compensating transactions.

------
jhugg
The attacks on NoSQL seem a little harsh to me. People needed scalable
systems. They needed them ASAP. Nobody in the RDBMS camp was even hinting at
products targeting this market. Then, the NoSQL camp built systems that scale
(albeit with compromises).

The fact that this has spurred the others to start building scalable RDBMSs is
great. But let's not pretend these new RDBMSs won't have compromises, they'll
just have different ones.

The important thing is for developers to make smart decisions about what
tradeoffs make the most sense for scaling their application. Different
applications will require different tradeoffs.

Disclaimer: I work for VoltDB.

------
va_coder
First off, how do you download Clustrix? With MongoDB, simple as pie:
<http://www.mongodb.org/downloads>

Now, where are the docs? Again, Mongo has great docs
<http://wiki.mongodb.org/display/DOCS/Home>

How can I verify your claims?

Oh that's right, you call a salesperson first....

~~~
PaulHoule
Well, there are quite a few commercial database vendors that offer parallel
and clustered RDBMS products, and many of them appear to be quite good.

Unfortunately, they've got a terrible marketing problem.

Before 1998 or so, a relational database was an expensive product that you got
from a vendor like Oracle. Since then, a generation of people have grown up
that think about using a commercial RDBMS the same way most of us think about
putting our hands in a toilet.

@va_coder hits the nail right on the head, it's not just the cost of the
product, it's the cost of the buying process.

If I want to trial a product that is open source or has an OS or free edition
(that could be mysql, mongodb, postgres or even Virtoso OpenLink or SQL Server
Express) I can download it, read the docs and play around with it and learn a
lot in a few hours. I might learn that the product is not for me, or I might
get a positive impression and feel ready to commit coding time to it.

If I want to trial Oracle or Clustrix, well, I'm going to have to start a
contact with a sales organization and then they need to pre-qualify me, and
then they need to qualify me and then I'll spend a few hours on the phone
talking to people (which might take a few weeks in wall-clock time.)

Even if they give me a 30 day free trial, I could easily spend $500+ of my
time just getting the trial... And once I've gotten to the point where I'm
negotiating with one vendor I'm going to feel a lot of pressure (internally or
from my superiors) to talk with some competitive vendors too to make sure I'm
making the right decision.

It's a shame because, certainly, a company like Clustrix could use the revenue
they get from product sales to support an awesome development team and really
deliver a better product. On the other hand, they have marketing channels that
are aimed at large organizations that can afford an expensive buying
experience... and it's an expensive selling process for them when they've got
to do "complex sales" that require approvals from a large number of
stakeholders. They've got to pass those costs onto you.

The trouble with this model is that tomorrow's large organizations are today's
small organizations. Today, Facebook could afford just about any commercial
software that's out there. However, they made critical technology decisions
(that are difficult to reverse) back when they were a little company that
could only afford MySQL.

~~~
weavejester
_If I want to trial Oracle or Clustrix, well, I'm going to have to start a
contact with a sales organization and then they need to pre-qualify me, and
then they need to qualify me and then I'll spend a few hours on the phone
talking to people (which might take a few weeks in wall-clock time.)_

If Clustrix offered cloud-based deployment of their database, they could have
a small free tier for people to play around with. Add a console applicaiton
for accessing the API. Make it really developer friendly.

Imagine if you could just type:

    
    
        $ sudo gem install clustrix
        $ clustrix create my-test-app
        Created cloud.clustrix.com/my-test-app
        username: my-test-app
        password: xd634shx
    

If they had that, I'd be trying it out in a flash. Then what if they had a
per-usage pricing system like AWS? Something where you pay per gigabyte.
Something that allows you to test stuff out for a few dollars a mount, and
then when you're ready, scale it right up to a few thousand or more.

~~~
cwcr7
Presumably at some point they will offer a cloud-based service. It
particularly makes sense given the fundamental design of their product.

Since they only launched their product quite recently, it is understandable
that they would want to test it with several customers before offering it as a
service.

In fact, if the product scales well enough, it would be perfect for deploying
as a cloud-based service and could be very successful as such.

~~~
weavejester
Ah, I didn't realise their product was only recently released.

------
ozataman
Another point to notice: He seems to have run his benchmarks under very
trivial data-schemas with a single/simple table. A big (speed, simplicity)
advantage of No-SQL is the ability to embed lots of data within the parent
model and manage a single table where you would have to manage many in a SQL
database.

I would be very interested to see a comparison where a large and "real" data
model (that contains 6-7 "joined" tables for the SQL setup and a single table
with the embedded document model for the No-SQL setup) is injected into each
of the technologies.

Also, it is just horrible style not to include the benchmark code for peer
review. Delivers near-0 credibility.

~~~
kchodorow
Excellent point. MongoDB doesn't claim to be any faster than other dbs at the
simple stuff (in fact, it's often slower because we haven't had years to
optimize everything). The speed gains that people usually see are because they
can just fetch one document, instead of doing complex joins or aggregations.

~~~
cheald
Indeed. A lot of the reason I like document databases so much is that you can
solve problems differently (and many times, much more easily) than you could
in a relational database. Compare:

    
    
        db.posts.find({tags: {$in: ["foo", "bar"]}})
    

to:

    
    
        SELECT * from posts JOIN taggings ON taggings.post_id = posts.id JOIN tags ON tags.tagging_id = taggings.id WHERE tags.tag IN ('foo', 'bar');
    

(Single query tag lookup; naively joins the entire taggings and tags tables
before limiting with WHERE)

Or a "better" query with two subselects (yikes!)

    
    
        SELECT * FROM posts where post_id IN (SELECT taggings.post_id FROM taggings WHERE taggings.tag_id IN (SELECT id FROM tags WHERE tags.name IN ('foo', 'bar')))
    

And that's the "find where any tag matches" case. Try the "when all tags
match" case ($all in MongoDB), and you'll go grey a few years earlier.

~~~
fedd
relational folks can denormalize when they want to

~~~
sqrt17
They get strangled by their DBA before they're finished with it.

And: the point of document stores is that you can denormalize (i.e. put lots
of stuff into one data item) and _still_ use indices into the lots of stuff.
Most commercial RDBMSes have means and ways to do that (e.g. storage of XML
content), but afaik it's not standardized and would tie you closely to a
single ($$$$$$) database vendor who will not hesitate to make you bleed
whenever they can.

~~~
fedd
the things like olap and intentional denormalization is not that dba are
against of.

by denormalization i didnt mean keeping xml in a clob, but keeping the values
in a row as if several tables are already joined into one wide table, thus
eliminating the need in joins.

------
wmf
Price is the elephant in the room here; NoSQL exists because people aren't
willing to pay for real databases. Building yet another expensive (i.e. > $0)
database that doesn't even work in the cloud won't help the Web 2.0 crowd.

~~~
sergei
These folks disagree with you. There are many more behind them.

[http://gigaom.com/cloud/clustrix-lifts-the-curtain-on-
early-...](http://gigaom.com/cloud/clustrix-lifts-the-curtain-on-early-
database-customers/)

And plenty of folks use MySQL (and PogreSQL to a much lesser extent). You just
can't scale those.

~~~
space-monkey
Last time I heard, twitter still uses MySQL for the statuses (tweets) table.
They did have a plan to migrate to Cassandra, but didn't go all the way
through with it. So I find it hard to agree with "you just can't scale those".
It may be a lot of work, but for some applications you can scale them.

Edit: twitter cassandra link (don't know if this is the latest):
[http://engineering.twitter.com/2010/07/cassandra-at-
twitter-...](http://engineering.twitter.com/2010/07/cassandra-at-twitter-
today.html)

~~~
ethangunderson
It's worth noting that Twitter has built a lot on top of MySQL to get to the
scale they're at. Take FlockDB for example,
<https://github.com/twitter/flockdb>

So I guess that statement should be written as "you can't scale with _just_
those". :)

------
moe
Winning a benchmark against MongoDB on a non-trivial workload is a little bit
like winning the special olympics.

I'd be more curious to see how Clustrix performs against Cassandra, Riak or
HBase in their respective domains. Those seem to be the more serious
contenders when it comes to "Big Data".

~~~
sergei
I chose Mongo because it gets a lot more attention on HN than any other
database. I don't remember the last time I saw a post on Cassandra on here...

~~~
btilly
I think I see Redis more than Mongo.

------
psadauskas
A blog post by the founder, without posting the benchmarks themselves? How can
anyone expect to take this seriously?

~~~
sergei
OK. Fair enough. I'll post the benchmarks.

~~~
ajays
You can't randomly throw numbers around without listing your benchmark code,
the config files, etc.

Secondly: even though it becomes clear eventually, you should mention up front
your relationship with Clustrix. Just because you are a founder of Clustrix
doesn't necessarily invalidate your findings, but full disclosure is always a
good idea.

------
luigi
He talks about populating MongoDB with "rows". No one should be using (or
benchmarking) a technology unless they understand what it actually is, and
what problems it aims to solve.

~~~
jdf
The term "rows" could certainly be replaced with "tuples", "objects", or
whatever else you prefer without changing any meaning of the post. If you look
at the benchmarks posted on Mongo's site

<http://www.mongodb.org/display/DOCS/Benchmarks>

you can see that most of them compare Mongo against MySQL. Certainly "rows"
are being inserted into the latter.

------
jnewland
has anyone out there actually used or evaluated clustrix? i haven't been able
to find anything on the internet about this thing that hasn't come straight
from the company.

------
mjw0
Spending millions on engineer time to avoid buying a real database is not good
finance either. At least with this approach you can start with the (free)
MySQL and only pay once you're sure your idea has traction.

------
fedd
> Interestingly enough, we never heard that SQL or the relational model was
> the root of all their problems.

i really think that sometimes SQL and the relational model is a problem. at
least, the relational database design is a course in a university, along with
(object oriented) programming. so, to properly use postgre or mysql in your
sophomore web startup, you should know two things well, or have a clever db
guy...

i have seen brilliant web programmers that design nightmare database schemas.

~~~
mjw0
So relational data is hard but implementing appropriate consistency checks in
your application is easy? Or do you just skip that second part and hope for
the best?

~~~
fedd
imho, the second, except that some consistency is present ib document dbs:
like in the order example, all order items will be within the document and
wont be lost without losing the whole order.

no-one would ever use 'eventually consistent' databases for financial or
trading data, i think. they're for higly scalable consumer web projects

------
richardmarr
Sergei,

I think it was a mistake to put this post on a blog with no other content. I
think it leaves the reader with the impression that this is the only thing you
want to contribute to the community... and as other commenters have pointed
out that contribution could be perceived negatively.

Something to think about next time.

~~~
JoachimSchipper
Every blog has a first post. What's so bad about that?

------
sergei
I updated the post with the benchmark source.

~~~
fedd
you updated your post and i am waiting people here at _hacker_ news to
criticize your benchmark setup.

maybe i missed smth but i still saw only the suggestion for you now send your
benchmark to mongodb guys.

edit: please discard, now i see there they say you should include joins to
your clustrix benchmark. waiting for the reply

------
drm237
How long does it take to add a column to that table with 300MM rows? Does
Clustrix require schema modifications?

