
MongoDB CTO: How our new storage engine will earn its stripes - lucasjans
http://www.zdnet.com/mongodb-cto-how-our-new-wiredtiger-storage-engine-will-earn-its-stripes-7000036047/
======
fleitz
BDB... seriously?

At this point why not just:

apt-get install postgres psql < "CREATE TABLE mongodb (key varchar[1024] NOT
NULL PRIMARY KEY,value jsonb);"

then you get the best of both worlds.

~~~
eknkc
Easy replication & sharding is appealing.

~~~
eddd
And replication for postgres is not easy? Sharding in postgres requires some
plproxy/pgbouncer knowledge - but they don't cause as many problems in the
future as mongodb. Me personally, when it comes to databases i go for the
reliable first - then i can play with scalability.

~~~
remon
That almost certainly implies you're not working on anything that requires
horizontal scalability. That's a luxury not everyone can afford. Scalability,
in almost all cases that actually require it, is the first priority.

It's like saying it's more important that the car drives reliably rather than
everyone that needs to travel fitting into the car in the first place.

~~~
davidw
I think there are probably way too many people out there doing scaling before
they're doing "getting things right". It's premature optimization that causes
a lot of problems.

Naturally, there are plenty of people who do need to scale a lot, too, but
presumably by the time they need to, they have a good idea of the problem and
exactly how best to scale it.

~~~
remon
That may be generally true but those two things are not mutually exclusive.
Scalability doesn't come at the expense of getting things right, it just might
come at the expense of certain features which applies to most distributed
system problems.

~~~
CHY872
Vertical scaling can take you a surprisingly long way

------
wiremine
I'm not a huge MongoDB fan, but I think it is great to see competition and
evolution in this space.

From the article:

"In addition to compression and record-level locking, WiredTiger also gives
MongoDB multi-version concurrency control (MVCC), multi-document transactions,
and support for log-structured merge-trees, or LSM trees, for very high insert
workloads"

I'd also love to see them add document joins at some point. That's a a big
differentiating feature for RethinkDB, although I've found Rethink's syntax to
be difficult to reason through.

~~~
remon
Why do you consider joins a big differentiating feature for NoSQL storage
solutions?

~~~
personZ
Most real world data is some mix of relational and unique / big bag of
properties. The best solution is often somewhere in between (which is, for me,
probably pgsql and its support of JSON, though I've achieved the same
versatility with xml data types).

------
t1m
Here is a link to some more 'thorough' benchmarks that compare WiredTiger to
many other database back-ends.

[http://symas.com/mdb/inmem/](http://symas.com/mdb/inmem/)

~~~
hyc_symas
I'd say the MongoDB folks made a pretty safe choice here, since WiredTiger
offers both LSM and Btree engines. I've recently completed some more
benchmarks on all of the above-tested engines, and will be publishing the
report soon (as soon as it's done being written...). Testing on-disk, with DBs
5x larger than RAM, at data sizes of 24, 96, 384, 768, 2000, and 4000 bytes
per record. The important result to note is that LSM's high write speed
advantage only exists for small record sizes, and for short durations. On long
duration tests the compaction overhead stops them in their tracks, and at data
sizes above 2000 bytes per record, LSM write amplification is too expensive
(regardless of test duration). I.e., above 2000 byte records, Btrees are
always faster than LSMs.

This result is particularly relevant for MongoDB, since a document store tends
to have large records containing multiple fields, as opposed to individual
values being stored as separate small records.

And in cases where that applies (large records), nothing comes anywhere close
to LMDB's performance.

(A preview of the on-disk test results I mentioned was presented at
BuildStuff.LT last week [http://symas.com/mdb/20141120-BuildStuff-
Lightning.pdf](http://symas.com/mdb/20141120-BuildStuff-Lightning.pdf) pages
103-on)

~~~
eis
Howard I always found your benchmarks as well as your comments and
explanations of the results to be very thorough and easy to understand. So
thanks for that. I wonder if you had a look at ForestDB which is also based on
a B+-tree like structure instead of LSM and claims to have very low write
amplification. They have published benchmarks[1] but there is no LMDB or
WiredTiger included.

[1]
[https://github.com/couchbaselabs/forestdb/wiki/Performance-R...](https://github.com/couchbaselabs/forestdb/wiki/Performance-
Results)

~~~
hyc_symas
By the way, for those of you who are happy with just the raw data and coming
to your own conclusions, you can get the advance look here:
[http://symas.com/mdb/ondisk/](http://symas.com/mdb/ondisk/)

~~~
hyc_symas
Page now updated with formatted results and conclusions.

------
dschiptsov
But the old one is crap, right?

------
qwerta
I am glad they finally introduced LSM trees and per document locking. But I
think thats too little too late :-)

~~~
remon
Why on earth would it be too late? MongoDB is comfortably in the top 3 in
terms of traction and they just fixed what many consider the primary technical
argument against using it. Too late for what?

~~~
davidw
Here's my guess about some of the most utilized/popular databases:

* SQLite - it's everywhere thanks to mobile phones.

* Oracle.

* Microsoft's thing has to have a big following.

* Mysql.

* DB2 is not as big as Oracle, but IBM isn't some lightweight, either.

MongoDB gets talked about a lot, and probably used among people trying new
stuff out, but I think a lot of mostly silent people wouldn't touch something
like that for their important data with a 10 foot pole.

~~~
dkhenry
I think you overestimate the reach of Oracle. They are no where outside
fortuine 500 and thats not a lot of deployments.

MySQL is most likely the most deployed thanks to the LAMP stack's
proliferation across all sorts of small and medium sites. After that it really
would be a toss up between MS-SQL who has traction in the SMB space due to it
being used for sharepoint and exchange, and PostgreSQL

but lets be honest the most utilized database is Access.

~~~
dagw
I do a lot of work with geo-spatial analysis and often work with with various
counties and municipalities in my experience everybody has an Oracle database
stuffed away somewhere storing the regions canonical geo-spatial data.

~~~
dkhenry
Even if each town had an oracle database squirreled away ( which they don't )
That still would not even come close to just the wordpress deployments (60
Million).

------
jitix
I have been waiting for this since a long time. Having implemented an OLTP
application using MongoDB (luckily their write loads were low), I stopped
using Mongo 2 years back because of the write bottleneck (it was still good
for OLAP though). Good decision by the MongoDB team since they already have
the market traction.

