
MongoDB and RocksDB at Parse - jasondc
http://blog.parse.com/announcements/mongodb-rocksdb-parse/
======
lpsz
MongoDB seems to have a bad rep on HN for its various shortcomings, and yet
here FB/Parse seem to use it. Is the bad rep overrated? (Honest question)

~~~
Jweb_Guru
No, the bad reputation is not overrated. This is the essence of why appeal to
authority is a fallacy. Facebook is (or was) largely written in PHP; that
doesn't mean PHP's bad reputation is overrated. Many banks run on COBOL; that
doesn't mean Cobol's bad reputation is overrated.

MongoDB's bad reputation is well-deserved and based on a long history of
misleading (or outright false) marketing claims coupled with poor technical
properties (from unavoidable data loss to horrible compaction performance to
global write locks). At this point, even if all the known issues were fixed
(and they aren't) it would still take a long time for their reputation to
recover. Given that some of MongoDB's direct competitors do _not_ suffer from
these problems and can largely function as drop-in replacements, recommending
people use something else is really the only responsible course of action.

~~~
lacker
_some of MongoDB 's direct competitors do not suffer from these problems and
can largely function as drop-in replacements_

Which competitors are you referring to? Personally I find the power and
flexibility of the MongoDB query language to be a big positive that for many
cases outweighs performance and reliability issues. It seems like no other
competitors are quite there.

~~~
benjarrell
I don't know that I would call them competitors, but DB2 and Informix both
have JSON support and both implement the MonogoDB protocol.

~~~
ahachete
Also ToroDB
([https://github.com/torodb/torodb](https://github.com/torodb/torodb))
implements the MongoDB protocol. But it uses PostgreSQL as a backend and it's
open source ;P

(ToroDB developer here)

------
polskibus
What's the main use case for something like RocksDB ? Is it an out-of-process
local caching engine (in-memory + unloads to disk if necessary) ? Or is it
something different? Can it communicate between nodes? Why would I use it
instead of a large dictionary in memory?

~~~
wicknicks
I'm going to provide basic answers here. You can look at the homepage for more
detailed answers (the video linked there is very helpful too) [1].

Main use case: Multi-threaded low latency data access. It is optimized for use
cases where the insert/update rate is very high. In short, it has an LSM tree
to quickly add new/updated records. At frequent intervals this LSM tree is
merged into the on-disk pages. It is very well engineered, making the lookups
very fast as well.

Queries can touch data on disk as well.

Rocks is an embedded database. Natively written in C++, but has bindings for
other languages. [2]

No, nodes do not talk to each other.

Something I am personally excited about: RocksDB has a merge operator, which
is (probably) still in development. It allows you to update the value of an
entry by providing a merge function. It is extremely useful to merge data if
you binary format supports it (for example, protobufs do this natively, and it
will be very smart to store your protobuf binary natively in Rocks, and do
regular merges to it).

No, an in-memory dictionary will provide far fewer guarantees and features.

[1] [http://rocksdb.org/](http://rocksdb.org/)

[2] [https://github.com/facebook/rocksdb/wiki/Third-party-
languag...](https://github.com/facebook/rocksdb/wiki/Third-party-language-
bindings)

------
aikah
By the way, does anybody knows how Parse execute sandboxes javascript on the
server ? I can't find an article on that matter.

------
tfb
> The RocksDB engine developed by Facebook engineers is one of the fastest,
> most compact and write-optimized storage engines available.

Will it fix MongoDB's data loss issues? I'm hoping that is what "write-
optimized" is partly implying.

~~~
jbooth
The 'compact' and 'write-optimized' are probably to differentiate RocksDB from
LMDB, which pretty thoroughly smokes it for read loads and has an arguably
more useful transaction model (which it pays for by single-threading writes
and having a little more write-amplification for small records).

~~~
hyc_symas
RocksDB is also single-writer.
[http://rocksdb.org/blog/521/lock/](http://rocksdb.org/blog/521/lock/)

"write-optimized" means they take great pains to turn all writes into
sequential writes, to avoid random I/O seeks and get maximum I/O throughput to
the storage device. Of course structuring data as they do makes their reads
much slower.

LMDB is read-optimized, and foregoes major effort at those types of write
optimizations because, quite frankly, rotating storage media is going the way
of the magnetic tape drive. Solid state storage is ubiquitous, and storage
seek time just isn't an issue any more.

(Literally and figuratively - HDDs are not going extinct; because of their
capacity/$$ advantage they're still used for archival purposes. But everyone
doing performance/latency-sensitive work has moved on to SSDs, Flash-based or
otherwise.)

"compact" doesn't make much sense. There's nothing compact about the RocksDB
codebase. Over 121,000 lines of source code
[https://www.openhub.net/p/rocksdb](https://www.openhub.net/p/rocksdb) and
over 20MB of object code. Compared to e.g. 7,000 lines of source for LMDB and
64KB of object code.
[https://www.openhub.net/p/MDB](https://www.openhub.net/p/MDB)

~~~
jbooth
I hear you RE: compact source code, and that as much as the benchmarks are why
I use LMDB (thanks) and not Rocks when I have a need.

I was under the impression that Rocks manages more compact storage, probably
as another consequence of all those sequential writes being packed right next
to each other, rather than LMDB's freelist-of-4k-pages model.

Is that the case or was I misreading whatever mailing list I got that from?
Don't get me wrong, I value not having compactions more than slightly less
write amplification, just checking my understanding here.

~~~
hyc_symas
RocksDB is more compact storage-wise when records are small. Notice here
[http://symas.com/mdb/ondisk/](http://symas.com/mdb/ondisk/) that RocksDB
space is smaller using 24 byte values, but same or larger at 96 byte values.
By the time you get to 768 byte values, LMDB is smallest.

~~~
jbooth
Cool, thanks for the response and for writing lmdb!

------
lighthawk
"And we now have some exciting news to share: we are running MongoDB on
RocksDB in production for some workloads and we’re seeing great results!"

Some stats and a way to replicate would be nice here.

~~~
spimmy
Like I said in the post, we're preparing a series of blog posts for next week
that does a deep dive into our workloads, our benchmarks, how to repro, etc.

Spoiler alert: we consistently get around _90%_ compression and the inserts
are ~50x faster. ^_^

~~~
andrea_s
Compared to MMAPv1 or compared to WiredTiger?

~~~
spimmy
We're testing with both. Somewhat hindered by the fact that WT often dies
trying to import our data, or crashes, or loses data in weird ways. :( I
believe WT will eventually get there, but rocks has been unfathomably more
solid for us so far.

~~~
diziet
WT has been pretty decent for us (only TB level amount of data, so probably
not as much as you folks are dealing with). The thing it has issues with is
_occasionally_ using the wrong indexes and caching them, and even putting
hints in does not solve the issue in all cases. It started happening on WT,
but it seems to be mongo's issue and not the storage engine.

~~~
spimmy
We have a lot of edge cases in our data set and usage model, no question. Glad
it's working well for you. :)

