Hacker Newsnew | comments | show | ask | jobs | submit | tfb's comments login

> The RocksDB engine developed by Facebook engineers is one of the fastest, most compact and write-optimized storage engines available.

Will it fix MongoDB's data loss issues? I'm hoping that is what "write-optimized" is partly implying.

reply


Which data loss issues? You're gonna have to be more specific. ;)

It doesn't fix the election rollback issue, because that's handled way above the storage layer. It does solve a whole slew of storage engine related write issues though. No more "we flush to disk every 100ms and call it good".

reply


The 'compact' and 'write-optimized' are probably to differentiate RocksDB from LMDB, which pretty thoroughly smokes it for read loads and has an arguably more useful transaction model (which it pays for by single-threading writes and having a little more write-amplification for small records).

reply


RocksDB is also single-writer. http://rocksdb.org/blog/521/lock/

"write-optimized" means they take great pains to turn all writes into sequential writes, to avoid random I/O seeks and get maximum I/O throughput to the storage device. Of course structuring data as they do makes their reads much slower.

LMDB is read-optimized, and foregoes major effort at those types of write optimizations because, quite frankly, rotating storage media is going the way of the magnetic tape drive. Solid state storage is ubiquitous, and storage seek time just isn't an issue any more.

(Literally and figuratively - HDDs are not going extinct; because of their capacity/$$ advantage they're still used for archival purposes. But everyone doing performance/latency-sensitive work has moved on to SSDs, Flash-based or otherwise.)

"compact" doesn't make much sense. There's nothing compact about the RocksDB codebase. Over 121,000 lines of source code https://www.openhub.net/p/rocksdb and over 20MB of object code. Compared to e.g. 7,000 lines of source for LMDB and 64KB of object code. https://www.openhub.net/p/MDB

reply


I hear you RE: compact source code, and that as much as the benchmarks are why I use LMDB (thanks) and not Rocks when I have a need.

I was under the impression that Rocks manages more compact storage, probably as another consequence of all those sequential writes being packed right next to each other, rather than LMDB's freelist-of-4k-pages model.

Is that the case or was I misreading whatever mailing list I got that from? Don't get me wrong, I value not having compactions more than slightly less write amplification, just checking my understanding here.

reply


RocksDB is more compact storage-wise when records are small. Notice here http://symas.com/mdb/ondisk/ that RocksDB space is smaller using 24 byte values, but same or larger at 96 byte values. By the time you get to 768 byte values, LMDB is smallest.

reply


Cool, thanks for the response and for writing lmdb!

reply


How would RethinkDB's real-time capabilities compare to doing something like this with Mongo? (I'm genuinely curious about any shortcomings with the following implementation, besides scaling.)

    // plugins/replicate.js

    var replicator = require('replicator');

    module.exports = exports = function replicatePlugin (schema, options) {
      schema.pre('save', function (next) {
        // notify subscribers of save
        replicator.notify('save', this);
        next();
      });
    }



    // schemas/game.js

    var replicatePlugin = require('../plugins/replicate.js');
    var GameSchema = new Schema({ ... });
    GameSchema.plugin(replicatePlugin);

reply


I don't know what the replicator module does, but from RethinkDB's end, it allows you to avoid polling (which is a scalability thing, lots of really nice library APIs are built on top of dead-slow polling of the database).

RethinkDB also lets you write queries and subscribe to changes on the query itself. So in mongo, you can tail the replication oplog to avoid polling, but you still need to filter every event happening on the database for the ones you're interested in. On top of that, if you need to do transformations of the data, you have to re-apply them to what you get from the oplog. With RethinkDB, you write the same query you would have, and the database can be very efficient in only sending changes you're actually interested in, and send them with the transformations you asked to be applied.

Check out these slides if you're interested: http://deontologician.github.io/node_talk/#/

reply


I'll start posting about it when things are further along. Unfortunately, I'm only one man and lack the resources to fund a team, unlike Facebook. It's just frustrating to see all this hype surrounding React/Relay when I've implemented essentially this exact functionality as part of a much larger project. For the past year (at least), I've considered this functionality to be fairly basic and implicit to modern web apps.

However, millions of people will see it eventually. Epic Games will almost certainly be using it for the community portion of their new Unreal Tournament.

This thread on their forums details some of the functionality on a more abstract level as it exists thus far: https://forums.unrealtournament.com/showthread.php?14859-Web...

-----


I'm not sure what you've come up with, since your post was deleted, but I'm looking forward to it!

I also home brewed a system very similar to what I described at my previous job. With the current information out there, it looked almost the same as relay, except without GraphQL. We implemented our own query/synchronization mechanism.

-----


Sorry. I was hesitant to post the comment to begin with given that I'm 100% sure I'd be downvoted, which it was literally within 1 minute of posting it. You can check my comment history for details on the project.

-----


> Anyone born since ~2000 perhaps is in this situation. Will they have eliminated this kind of "fun" "bullshit" from their behavior completely?

They find different ways of having fun and bullshitting.

-----


The best kinds of side projects are the fun ones. When you really enjoy working on something, you really don't need to "stay motivated".

I love to code and I love to build things, and I would lose my sanity if there was only one thing I ever worked on. Plus, one of the best things about side projects is that there's usually no pressure to really get it done, so when I load my workspace for a side project, I'm always excited to get started, because I know nothing really important depends on it. And unsurprisingly, I've learned so much from random fun stuff on the side that I would never have learned otherwise.

-----


Merry Christmas, everyone. I tend to lurk a lot, but this is one of the few programming communities where I don't expect to be met with negativity and condescension every time I post something. And beyond that, just reading everyone's discussions has easily helped shape me into the person I am today. Thanks for the past few years and many more to come!

-----


You must be new here. Wait until the negativity and condescension kicks in.

-----


If there is some of those, it's still at a dramatically lower rate than from any other boards that I've been looking at.

This is refreshing and always good for the mind, scientific or not that it might be.

-----


> There's also the fact that it won't lose its data on power loss.

Does this mean that for software to take full advantage of this, the software will need to be updated to account for it? For instance, Redis loads data into memory from disk upon starting, but if something like this 400GB SSD is available as memory (and say for instance, 300GB is in use by Redis), wouldn't it make sense for Redis, upon starting, to just "remember" the state of the memory rather than reloading it from disk?

-----


Not really; we already have suspension (to RAM and to disk), this can work just like it. For the userspace software, it'll be like it never stopped running.

-----


No, that's not correct. This is MUCH MUCH slower than RAM.

You would not use this to replace RAM, but rather have a second section of memory for persistent data.

-----


In the hypothetical world where this was as fast as real RAM, it would be cool if the OS was smart enough to read mmap'd data onto it and deal with remembering "this file was mmap'd here and hasn't changed since, next time it's mmap'd again we can just reuse it", and instantly you have support for all the software that just uses mmap (comedy MongoDB option?)

-----


https://www.loggur.com is like Firebase on steroids, though only launched a few weeks ago. There is a lot to come: first thing being proper documentation for the REST API behind Loggur (should developers prefer to talk directly to it), and then hopefully huge strides in usability. But we need tons of feedback for the latter!

-----


https://www.loggur.com is like Firebase on steroids, though only launched a few weeks ago. There is a lot to come: first thing being proper documentation for the REST API behind Loggur (should developers prefer to talk directly to it), and then hopefully huge strides in usability. But we need tons of feedback for the latter!

-----

More

Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: