Hacker News new | past | comments | ask | show | jobs | submit login

It is durable from an OS perspective. I started typing in a long explanation, but I think the comment for the "sync" field in the options.h file captures things well:

  struct WriteOptions {
    // If true, the write will be flushed from the operating system
    // buffer cache (by calling WritableFile::Sync()) before the write
    // is considered complete.  If this flag is true, writes will be
    // slower.
    //
    // If this flag is false, and the machine crashes, some recent
    // writes may be lost.  Note that if it is just the process that
    // crashes (i.e., the machine does not reboot), no writes will be
    // lost even if sync==false.
    //
    // In other words, a DB write with sync==false has similar
    // crash semantics as the "write()" system call.  A DB write
    // with sync==true has similar crash semantics to a "write()"
    // system call followed by "fsync()".
    //
    // Default: false
    bool sync;
We don't have much experience with scaling to larger databases yet. There are known problems which will cause a (smallish) constant factor slowdown in write performance after the database becomes a few (10?) GB in size, but I don't recall the details all that well, and the implementation has changed somewhat since that experiment. I would like to characterize this better and fix things so we can support somewhere between 100GB-1TB databases well. It just hasn't become a priority yet.

The benchmark numbers on the linked page were from a small million entry database that easily fits in the OS buffer cache.




Welcome to HN commenting, Sanjay!

(For those that aren't familiar, his bio is here: http://research.google.com/people/sanjay/index.html)


Writes being lost means what, a trashed file? Or merely an incomplete one?


Merely an incomplete one. Leveldb never writes in place: it always appends to a log file, or merges existing files together to produce new ones. So an OS crash will cause a partially written log record (or a few partially written log records). Leveldb recovery code uses checksums to detect this and will skip the incomplete records.


Thanks!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: