
The genius and folly of MongoDB - throwawayeau
http://nyeggen.com/blog/2013/10/18/the-genius-and-folly-of-mongodb/
======
rdtsc
The problem with MongoDB is their shadiness. The shipped with unacknowledged
writes up until not too long ago. In other words you would write to it and
there wouldn't be an ok or fail response, you'd just sort of hoped it would go
in.

They fixed that problem but it was too late. In my eyes they proved they are
not to be trusted with data.

Had they called themselves MangoCache or MongoProbabilisticStorage, fine, can
silently drop writes, I don't care it is not database. But telling people they
are a "database" and then tweaking their default to look good in stupid little
benchmarks, and telling people they are webscale, sealed the deal for me.
Never looking at that product again.

~~~
functional_test
I understand some of the reasons people didn't like Mongo, but this always
vexed me. The default write level was _very_ clearly documented and you could
always change it as necessary. Surely it would be necessary to read the
documentation of a database before rolling it out to production?

~~~
rdtsc
> Surely it would be necessary to read the documentation of a database before
> rolling it out to production?

You buy a car. It comes with brakes disabled because for whatever reasons that
also lets it get to a higher top speed. You are expected to read you car owner
manual and on page 54 you find that you have to hold "enable brakes" button
under the console for 10 seconds to turn on your brakes. Would it vex you that
people might be slightly critical of that car. Clearly they are silly for not
reading their car manual until page 54.

That "feature" is not something that should be discovered by reading docs or
when you get a crash and then load a backup from another week and still get a
crash and then you start hitting your head on your desk.

Anything calling itself a "database" should not have shipped with those
default settings _ever_. If they did they might have gotten away with it in my
book by having a big flashing red warning on the front or download page. I
don't remember one.

~~~
functional_test
I would also read the manual of a car I just bought before driving it. I guess
that's just my style.

Don't get me wrong, I'm not saying your assumption is unreasonable. But in the
end, it's on you as a conscientious developer to read the documentation. I'm
not even suggesting cover to cover - in this case though they are very up
front about write concerns. There is no real excuse to find this out any other
way, it's just negligence.

~~~
clwk
Out of curiosity, _have_ you ever bought a car, and _did_ you actually read
the whole manual before driving it?

It's a nice hypothetical, and that might be your style. Most real-world car
purchase scenarios I'm familiar with would make that style impractical.

~~~
functional_test
Literally 1 month ago, I bought a car and did exactly this. What's impractical
about spending a little time to read?

~~~
vanadium
Your approach is wholly impractical on its face, actually.

So, let me get this straight: You laid down tens of thousands of dollars on a
vehicle that you only post-purchase read the manual of, and you're raising
this as some sort of standard people should follow?

Honestly, asking the right questions (and test-driving) upfront should be what
lands the purchase, and not discovering the folly of purchasing a car with
such ass-backwards issues you only discover after the fact when you bother to
dig out the manual.

You drove it off the lot after you bought it, right? Or did you read the
manual in the lot right after signing the papers locking you into the
purchase?

~~~
functional_test
Turns out you can read the manual of a vehicle ahead of time. Turns out you
can also test drive and do everything else you said, and we don't need to
pretend that it's all mutually exclusive. Stop being a pedant -- me listing
every bit of due diligence about my car isn't relevant, so let's stay on
topic.

Honestly, how can people on HN actually be this against _reading_? Especially
things that are really important? Sure, don't read the contest rules for your
McDonald's monopoly. But if the data for your livelihood depends on something,
there's no excuse for not reading the documentation.

------
bkanber
I posted this further down the thread, but I thought I'd share my thoughts on
why I like mongo.

Most people don't like mongo because 10gen gives the impression that mongo is
better than it actually is, many people feel that mongo is not reliable enough
for at-scale applications. They're right; it's not. But that's ok, because:

Mongo's really great for rapid prototyping. You don't need to worry about
updating the schema at the db level, it can store any type of document in any
collection without complaining, it's really easy to install and configure, the
query language is simple and only takes a couple of minutes to learn, it's
pretty fast in most use cases, it's pretty safe in most use cases, and it's
easy to create a replica set once your prototype gets usage and starts
scaling.

Mongo does everything well up until you reach the level where you need heavy-
hitting, at-scale, mission-critical performance and reliability. Most projects
out there (99 in 100?) will never reach the level of scale that requires
better tools than mongo. And since the rest of it is so easy to use, that
makes mongo a great starting point for most projects. You can always switch
databases later, but mongo gives you the flexibility to concentrate on more
important things in the early stages of a project.

~~~
danpalmer
The case for using it as a prototyping database is the best use-case I've seen
for Mongo, however I'm not sure it's always a good idea.

For a hack-weekend sort of project, fine, but if you are in any way attempting
to make a product, it strikes me as the sort of thing that would be really
difficult to change later down the line, and so worth investing the very
little extra effort it takes to include your schema in the database, and use
something like Postgres/MySQL/etc.

~~~
spamizbad
It is really difficult to change down the line. My company tried and failed.
Now we're stuck with MongoDB. Huge mistake, but a lesson was learned.

Edit: "tried and failed" in the political sense. You don't change horses in
midstream etc.

~~~
gaius
If this was a proprietary database we'd call that vendor lock-in and advocate
an open source solution. 10gen is a company that earns it's revenue from
selling support. They are highly incentivized to lure you in and trap you in a
situation that requires a lot of consulting.

~~~
threeseed
Or they could just be interested in adding useful features.

PostgreSQL has HSTORE which is a useful but proprietary feature. Cassandra has
the ability to have Lists/Maps as data types. Again useful but proprietary.

If you are that concerned about database independence then do what everyone
else does. Use an ORM, minimise coupling in your domain model and do as much
as possible in the application layer.

~~~
integraton
You are misusing the word "proprietary."
[http://en.wikipedia.org/wiki/Proprietary_software](http://en.wikipedia.org/wiki/Proprietary_software)

------
rgo
Previous versions of my startup's enterprise product used to be based on
relational DBs (mostly Oracle, MySQL also). This year we switched to Mongo and
dropped RDBMS support.

RDBMS performance was fine most of the time as we're not doing big data
really. Our problem was developing and maintaining a schema that holds lots of
metadata many levels deep. Our app allows for unlimited user defined forms and
fields, some of which may hold grids inside which hold some more fields... Our
app also handles lots of logs and large file dumps, which slowly made data,
cache and fulltext search management mission impossible. Even though we had
considerable previous experience with Mongo, it took us a long time to switch
because we were utterly scared. It's nice to sell a product that is Oracle-
based, as that sent out a message about our "high-level of industry
standardization and corporate commitment" bullshit that (we thought) is quite
positive for a startup competing against the likes of IBM, HP, etc.

To our surprise, our customers (some Fortune 500 and the like) were VERY
receptive to switch to a NoSQL, opensource database. Surprise specially given
it would be supported by us instead of their dreadfully expensive and mostly
useless DBA departments. It even came to a point where it has changed their
perception of our product and our company as next generation, and surprisingly
set us apart from our competition even further.

In short, as many people here know, not all MongoDB users are cool kids in
startups that need to fend off HN front page peak traffic day in day out.
Having a schemaless, easy to manage database is a step forward for sooo many
use cases, from little intranet apps to log storage to some crazy homebrew
queue-like thing. 10-gen superb, although criticized, "marketing effort" also
helps a lot when you need to convince a customer's upper-management this is
something they should trust and even invest on. I can't express my gratitude
and appreciation for 10-gen's simultaneous interest in community building,
flirting with corporate wigs and getting the word out to developers for every
other language. Mongo is definitely a flawed product, but why should I care
about the clownshoeness of its mmapped files when it has given us so much for
so long?

~~~
continuations
> Having a schemaless, easy to manage database is a step forward for sooo many
> use cases

Can you explain why can't you do schemaless with an RDBMS?

From what I understand MongoDB is schemaless by storing all fields as one
single JSON document. So what stops you from doing the same in an RDBMS - have
a catch-all field "JSON" and store all your data there?

~~~
aaronem
That gets you halfway there, but you still don't have the ability to query
your datastore by structure, unless you've installed PostgreSQL 9.3 and are
using its JSON field type, which does have that capability, thus entirely
demolishing the NoSQL USP as far as I can determine.

~~~
edraferi
That is awesome.

~~~
aaronem
Also of note is that stored procedures are supported in a variety of
languages, including Javascript, so it's quite easy to handle cases where the
surprisingly broad range of core JSON functions and operators [1] doesn't
include what you need.

PostgreSQL has also recently added a key-value store type [2] with semantics
reminiscent of Redis. The impression I get is that they're gunning for the
NoSQL kids in general, and this pleases me; while I grant it is sometimes
possible and necessary to obtain new insight in a field by ignoring all that's
gone before, I very much doubt this is one of those times, and I am therefore
delighted to see a properly engineered database engine gain more or less the
entirety of the features which draw interest to the NoSQL crowd in the first
place.

[1] [http://www.postgresql.org/docs/9.3/static/functions-
json.htm...](http://www.postgresql.org/docs/9.3/static/functions-json.html)
[2]
[http://www.postgresql.org/docs/9.3/static/hstore.html](http://www.postgresql.org/docs/9.3/static/hstore.html)

~~~
chris_wot
That is extremely interesting. So it looks like you can store a JSON type _as
well as_ a KV datatype in Postgres! And it looks like it is relatively easy to
convert between the two.

This leaves only performance. I think I'm still confused around this area -
why do people say that non-relational technologies like MongoDB are faster
than relational databases?

------
willvarfar
Varnish famously demonstrated how to use the kernel page cache effectively.
MongoDB, though, is Squid-like. Its an interesting comparison.

Every single MongoDB step has had the old timers groaning.

Even with something solid like Tokutek's storage engine in it, its going to be
a hard sell.

~~~
leif
_I 'm an engineer at Tokutek_

I'm confused by your comment. The beginning acknowledges the fact that MongoDB
has a weak storage engine, but your conclusion is that, even with a strong
storage engine like ours, there is still a problem. What other problems do you
see? Are they something we could work on?

~~~
jpgvm
This is going to come off as abit negative but I kinda feel it has to be said.
I would first like to say I do love the Fractal tree indexing, very cool and
could have alot more intesting usecases outside of databases (I'm thinking
logical volume/block storage etc.. I'm always thinking in kernel land..)

The problem is that Mongo advertised itself as a database and wasn't one. Once
you do that reputation of the product is dead forever.

TokuMX is a real database as far as I can see, MVCC, great indexing story etc.

By association TokuMX is probably not regarded as highly as it should be.
Which is a shame but it's a people problem, not a technical one. People can
very easily lose trust in a technology at which point it's effectively dead,
it might take a long time to die due to lock-in but it's dead.

For instance I have recently started playing with RethinkDB over TokuMX almost
purely because of Mongo association.

Now technically that might not sound like good reasoning but when you think
about the kind of person that writes a database that doesn't fsync your writes
by default and relies on the page-cache over doing direct I/O when building a
database.. doesn't really inspire confidence in the network stack, the query
planner.. or well anything.

If anything it makes you insistent on not having ANYTHING to do with that sort
of codebase.

Just replacing the storage engine might actually be good enough, but restoring
my trust in the rest of the codebase is almost a forgone conclusion at this
point.

~~~
leif
I've seen a lot of the rest of their code, and most if it is getting better
over time, as they grow they're forced to adopt better habits in order to
scale their engineering team. I think you're misunderstanding the type of
programmers they are. They didn't use mmap because they are sloppy everywhere,
they used mmap because their critical innovation was not in storage. What they
really thought was valuable, what they wanted to work on, was the query
language and cluster management tools, so they did the simplest thing for
storage and moved on (personally I don't understand why they didn't just use
BDB, maybe they were afraid of transactions, but I suppose everyone has a
little NIH syndrome in their database). Now they're a bit locked in to that
code, because after bolting on journaling (that architecture is a brilliant
but incredibly dirty hack), the code is a mess and I'm sure nobody wants to
touch it. In fact most of the other subsystems have been getting cleaner
rewrites, except for the storage layer. I think the only way out is a complete
replacement, which is what we did so I feel pretty good about that. So I don't
know if I'll convince you, but I've read a lot of their code (especially in
the last few weeks, I've been backporting things from 2.4), and that's the
feeling I get about their history and vision. Hope it gives you some insight.

~~~
jpgvm
I don't think the problem is I misunderstand them, I just disagree with them

I disagree with them on what is the minimum viable product for a database. I
come a storage and service provider background where failures are treated very
harshly (usually death of companies for singular mistakes) so I take releasing
a product that stores customer data very seriously.

To be honest this is the biggest attraction for me to RethinkDB. They waited a
sufficiently long amount of time with a commercially backed team of very
competent engineers that obviously have the required background to sit down
and DESIGN a database. The query language generates a non-turing complete
language with a clean AST the has all the right deterministic characteristics
to implement a powerful planner/optimizer. Their on disk format has been abit
in flux but the core design is excellent and you can see that it has been
optimized for very fast range queries. Even the API protocol and serialization
were designed with care, not to mention the excellent ReQL language and
attention to detail when integrating drivers into the host language.

Which is the other thing I tend to dislike about Mongo, it reeks of lack of
design. The journalling effort for instance as you pointed out is very adhoc,
this goes for GridFS and alot of the other features they have integrated into
the codebase. These are smells that I can't ignore when looking at a product
that I need to trust with my data.

The counter argument is to not trust it with your data. But I am yet to find a
reason where that makes sense where another datastore wouldn't be a better
choice.

------
rogerbinns
There are also people using MongoDB and finding it meets their needs well, and
don't feel the need to keep writing about how everything sucks or is
wonderful. (I'm one of them.)

None of how MongoDB works is a secret. And just like everything else it has
sweet spots and problem areas. And like many others, development continues and
it gets better.

The database does not get the job done - it is a tool to help get the job
done.

~~~
orthecreedence
> None of how MongoDB works is a secret.

Maybe not now, but this hasn't always been the case. The fact that they had
(have?) a global write lock was completely buried on the doc site for ages.
Benchmarks were waved in front of developer's faces to distract them from the
"drivers don't actually write data, they just blast it out in every direction
and hope it lands somewhere good" BS.

I don't use Mongo anymore, and I think a lot of it has not to do with the
database itself, but with the way 10gen used their marketing machine in a
dishonest way. They incurred a lot of trust-debt, and now have a serious
amount of work to do to pay it back.

~~~
rit
I worked for 10gen (now MongoDB) for over 2 years (I left in December).

Never once while I was there did they publish a benchmark: There was a
[publicly] stated company policy to not publish or comment on benchmarks.

If you have evidence otherwise (i.e. benchmarks published by the folks working
on MongoDB) fine, but I take this as a deliberately inflammatory (and false)
statement.

EDIT: The global write lock was removed ~last August; there is now a database
level lock. Future releases will likely make that more fine grained.
Additionally, the drivers no longer do "unsafe" writes, but check w/ server..
as of the same release.

~~~
mdellabitta
> database level lock

But that's not anywhere close to good enough.

~~~
lowboy
> But that's not anywhere close to good enough _for me_

FTFY

~~~
mdellabitta
> But that's not anywhere close to good enough for concurrent, multiuser
> systems with reasonable traffic.

FTFY

~~~
lowboy
Mmhmm. Like I said, that's not good enough _for you_ and for _your needs_. For
other people it's fine. That's important to note.

~~~
lucian1900
At that point you might as well use a JSON file per database and lock the
entire file, parse, change then serialise again. That might even be faster.

------
yeukhon
I am so happy Postgres is adding support for JSON. This is a big change. The
sole benefit of mongo to me is that you can be flexible with your schema at
the beginning. But the consequences are

* you have to learn to do indexing right later (if you have to scale)

* failure and miss starting to occur (as you scale)

* more code to write to manage legacy schema and optional fields

The last is painful and ugly. Whereas if you start out with a good schema that
last point is in a good hand. When you use SQL you always have the restraint
that "xyz" attributes are repeating and you can just make a new relation,
whereas with mongo you'd stuff 20 fields into a single collection. The
refactoring is harder.

I will begin to migrate back to SQL for new projects.

Also ecosystem is richer in SQL. I have not seen a good ORM for Mongo.
MongoEngine is fine but implementation + db have a lot of issues make that ORM
a bit unusable from time to time. SQLAlchemy is good.

PS: For quick PoC and Hackathon projects sure prototyping with mongo is fine.

~~~
warmwaffles
> I have not seen a good ORM for Mongo

Uh, Mongoid is good. I have used it on past projects. It's nice. (mainly use
some form of SQL now)

------
functional_test
He's right that MongoDB could use improvements like string interning so you
don't need to worry about field names. But overall, I think this article is
very misleading.

If you use MongoDB in production, you should definitely take he time to learn
about the durability options on the database side AND in your driver. By using
them appropriately, you can have as little or as much as you like. Data sets
larger than 100GB are no problem either -- right now I'm running an instance
with a 1.6TB database.

As always, use the right tool for the right job. If you need joins/etc. and
don't need unstructured data, Mongo probably isn't a great choice (even with
the aggregation framework).

~~~
yummyfajitas
For what use cases is Mongo the right tool?

~~~
JulianMorrison
You have a smallish number of documents where some particular field of fixed
size gets overwritten a lot, the old values are uninteresting, and it wouldn't
really be a tragedy if your data got trashed. For example, it's the player's
score.

You want a fixed-size, rolling backlog of time series data such as logs.

~~~
twic
Is it better than a relational database for that?

~~~
yummyfajitas
Postgres update performance is pretty bad. When running a big data migration,
it's generally faster to copy the old table to a new temporary table and
rename the temp table to the old table than it is to run an update.

~~~
ams6110
This is the case with all RDMBS I've used.

~~~
pkolaczk
RDBMSes need to check for primary key violations, hence read before write.
Random access is slow. The fastest you could do is "no read-before-write,
append-only writes, compact later" (Cassandra way).

------
mattdeboard
Really like this article. I try not to dump on MongoDB too much because
frankly I have never taken the time to understand its internals. I constrain
my criticisms to particular unnecessary failures/inadequacies that I
personally have experienced (or any "I'll just use mongodb so I don't have to
worry about my data" sentiment).

Funny punchline at the end there too.

~~~
calinet6
I like this too. There's very little to criticize, since it basically tells it
exactly like it is without _too_ much embellishment, and makes intelligent,
honest conclusions.

------
anatari
Article is spot on about mongodb being ideal for online games. We use it as
the main datastore for our latest game, and it has worked out very well for
us. My main gripes with it has been key values taking up too much space and
how difficult it is to shard. I think Rethink DB will be even better once that
matures.

~~~
meowface
RethinkDB looks like a much better database than MongoDB.

Unfortunately though, I believe Mongo is still beating it at performance,
which is the one thing keeping me away.

~~~
dodyg
Read performance or write?

------
bithive123
This article doesn't really make a case for "genius" \-- "saving grace",
maybe. And in what universe are the Redis data structures "crazy"?

~~~
fiatmoney
Redis internal data structures are quite sophisticated.

~~~
bhahn
How does being sophisticated imply crazy?

~~~
dcre
It was a figure of speech; he just meant "non-standard."

------
mistercow
>But in that case, it also wouldn’t be crazy to pull a Viaweb and store it on
the file system

I've done this before when I was doing work for a client using an existing
simple web host with no built-in options for databases. It works well, and the
nice part is that there's a simple, obvious way to do any query. The bad part
is that anything other than a primary key lookup is slow unless you add a lot
of complexity.

------
programminggeek
I've used MongoDB for various projects and found it nice to use. Lately
though, I've found MySQL to be pretty enjoyable too, so honestly, what's all
the fuss? It's a database.

Nobody writes about the filesystem like they do the database, and yet they do
the same job - store and retrieve data.

~~~
rsynnott
> so honestly, what's all the fuss? It's a database.

Different types of databases are useful for different things.

> Nobody writes about the filesystem like they do the database

You must have missed the last decade of people going on about ZFS.

~~~
programminggeek
> You must have missed the last decade of people going on about ZFS.

I guess I did, what's the big deal about ZFS?

------
lafar6502
Maybe MongoDB is clown's shoes, but so are 99% of all technology startups.
They all fail before reaching limits of the database.

------
bsg75
> MongoDB is easy to make fun of.

I think more often its easy to poke fun at _how_ its used.

When any tool or tech is used globally, before knowing its limitations,
problems are likely. Attempting to use MongoDB in all storage or persistence
scenarios is no more sensible than using MySQL in all cases.

Yes, there is marketing around this product that must be looked at critically
- after taking into account that many newly developed technologies won't solve
all the problems older tech have worked for decades to solve.

~~~
hayksaakian
I think the difference is you don't get made fun of online for using MySQL for
everything

~~~
cincinnatus
MySQL is richly deserving of ridicule as well. It's ubiquity is unfortunate.

~~~
jbooth
If you're going to comment so strongly, some explanation of why it deserves
such ridicule would contribute much more value to the discussion.

~~~
twic
No check constraints. Spotty transaction isolation. Silent data corruption if
you happen to make certain kinds of updates while using statement-based
replication. No on-line schema updates (is that still true?). Complete
inability to execute joins of any size in reasonable time due to the lack of
merge or hash join strategies. Corresponding inability to handle subqueries of
any complexity. Readers block writers (at table level with MyISAM - and still
at row level with InnoDB?).

As well as things like that which are actually ridiculous, there is also the
substantial gap in features as compared to real databases. Things like
recursive queries, user-defined types, partial indices, etc, are commonplace
in the more sophisticated databases. You probably won't need them for a simple
web application (or even a complex one!), but they can be very useful when
trying to do more complex things, or manage a complex system efficiently.

~~~
AlisdairO
I believe that InnoDB is an MVCC implementation, so readers blocking writes
shouldn't happen. Another thing to add to your list is missing window
functions.

I'm not a big MySQL fan at all, but it's still leaps and bounds ahead of mongo
technologically.

~~~
mdellabitta
Just ran into this link, which seems to describe how MVCC can sometimes not be
enough.

[http://ronaldbradford.com/blog/understanding-innodb-
mvcc-200...](http://ronaldbradford.com/blog/understanding-innodb-
mvcc-2009-07-15/)

~~~
AlisdairO
Having read through it, I rather suspect that that's not a matter of writers
blocking readers or the other way round, but instead a case of writers
blocking writers - he's writing a lot of data to the table, and it's highly
likely that InnoDB has escalated the lock to a table lock - which effectively
prevents concurrent writes.

~~~
twic
I don't believe that InnoDB escalates locks. Quoting from the fine manual:

[http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-
mo...](http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html)

> InnoDB does locking on the row level and runs queries as nonlocking
> consistent reads by default, in the style of Oracle. The lock information in
> InnoDB is stored so space-efficiently that lock escalation is not needed:
> Typically, several users are permitted to lock every row in InnoDB tables,
> or any random subset of the rows, without causing InnoDB memory exhaustion.

------
nawitus
So, what's a good NoSQL database for e.g. node.js use? The only alternative I
know of is CouchDB. (Yes, I should give more parameters about the intended
use, but I really don't know any alternatives).

~~~
nostrademons
What's wrong with the Viaweb/Arc/HackerNews/Mailinator approach of just using
in-memory datastructures (hashtables, linked lists) and then journaling out
changes to the filesystem as records that are read in on startup? It's
incredibly simple and blindingly fast as long as you stay on one server, and
you can get several thousand QPS of capacity on that one server (vs. like 10
with a Django/Rails + SQL database solution).

Another highly underrated solution is using MySQL/PostGres as a key-value
store. Just create one table for each entity type, with the primary key as the
key and a JSON or protobuf blob as the value. You're using completely battle-
tested solutions, you've got bindings in basically every language, you're
doing basically the same work (at the same speed) as your NoSQL solutions, but
you have a lot more flexibility to add additional indices and can rely more on
pre-existing functionality than a MongoDB or CouchDB solution.

~~~
byroot
> What's wrong with the Viaweb/Arc/HackerNews/Mailinator approach of just
> using in-memory datastructures

[https://news.ycombinator.com/x?fnid=cjVXpi8HxVR5TTze3bqSCa](https://news.ycombinator.com/x?fnid=cjVXpi8HxVR5TTze3bqSCa)

    
    
      Unknown or expired link.
    

Oh I remember now...

~~~
pg
That's caused by using closures to create dynamically generated "callbacks" on
the server, not keeping data structures in RAM. If you ask for some old item
not in memory, it just gets lazily loaded.

~~~
byroot
Sure you have full permalink support, but why do you have to rely on closure
to do pagination ?

My guess: because by relying on in-memory data-structures you can't do what
any half assed php forum do, ad hoc queries.

~~~
nostrademons
I suspect he doesn't have to _rely_ on closures to do pagination: they're a
programming convenience that means you don't have to do things like think
about what state persists between pages.

Anything you can do with SQL you can do with in-memory data structures. If
you're interested, I'll be happy to take any SQL query and convert it to some
Python list comprehensions on arrays of dicts.

~~~
swah
Some statements like self-joins become relatively compact in SQL though...

BTW, do you miss Java's more advanced structures (say MultiSet) when
programming in Python/Go?

~~~
nostrademons
Pretty rarely, at least in Python. I don't miss MultiSet, because Python has
that (collections.Counter). Ditto LinkedHashMap (collections.OrderedDict).
Those are the two "extended" collections that I most often use. I do miss the
absence of balanced binary trees occasionally, since sometimes it's useful to
have an associative container with a defined iteration order, but sorted(dict)
is usually good enough where performance is critical. And Python's heapq
module is a bit harder to use than Java's PriorityQueues, but all the
functionality is there.

I think I'd miss these a bit more in Go because the built-in datatypes are
privileges in some of the language statements, but I haven't written enough Go
code to really feel their absence.

------
mcot2
A lot of these downsides are fixed by Tokumx. Real transactions, document
level locking, compression and disk optimized indexes. I suggest everyone take
a look at it.

~~~
ericingram
I agree. TokuMX has filled many holes in Mongo and (at least in my experience
so far) performs very well. It's got great documentation, backed by a
brilliant team. As a drop in replacement for mongo binaries, it's really easy
to install and offers professional/enterprise support if you need it.

------
danbmil99
I feel like the discussion of MongoDB is a bit like the discussion around the
Affordable Care Act (aka "Obamacare"). The conversation always shifts between
whether the very idea of a noSQL db is a good one, to the question of Mdb's
implementation faults and (I guess?) its strengths.

Whether 10gen are vapid spin-meisters or not, even whether they have developed
a usable product, seems orthogonal to the question as to whether a schemaless
persistent storage layer might be a better fit for some projects than a
relational database.

------
the1
It took minutes to mongoimport 10k small documents. import gets slower. not
sure what's going on.

------
joeblau
One thing I always do is try to scope a database for the problem. This site[1]
has been a valuable resource for myself and colleague who were evaluation the
best way to store/access our data depending on what we need back.

[1] - [http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-
redis](http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis)

------
dlau1
Can someone comment on how mature rethinkdb is at the moment?

I'm considering moving away from MongoDB before I have to implement what seems
to be an incredibly complicated architecture to get it to scale on the level
tens/hundreds of millions of documents.

~~~
leif
RethinkDB is not yet "ready for production use" but reportedly will be soon.

If you're currently on MongoDB but need more performance, concurrency, or
compression, please try TokuMX: [http://www.tokutek.com/products/tokumx-for-
mongodb](http://www.tokutek.com/products/tokumx-for-mongodb) It's a drop-in
replacement server that uses a better storage engine but speaks the same
protocol and query language.

------
hannibal5
Like the article says, ZFS is pretty damn good.

Keystore where the "engine" is ZFS works mighty well and is reliable. There is
little need for simple solutions like MongoDB if the filesystem rocks.

~~~
justincormack
Filesystems are not great as kv stores if your values are usually very small
(under page size). Well, they might still be better than some systems...

