Hacker News new | comments | show | ask | jobs | submit login
Show HN: Implementing durability for in-memory databases, on SSDs (eferm.com)
40 points by Emore 2264 days ago | hide | past | web | 4 comments | favorite

Excellent write up.

Instead of only the BTree vs. BTree durable comparison in qps, I think it would be helpful to add plots with the following information:

1) Increase in write speed gained from batching. Specifically, in section 3 you mentioned that the constant time overhead was your motivation to design the batching system. In a world of SSD writes, the usefulness of batching comes in to question. To see a gain realized would put this question to rest.

2) The effect of batching on the write-time of clients. The write up focuses on reducing the db's workload, which is the correct problem to focus on as that is the bottleneck. But there is a cost on the client which comes waiting on the writes of other concurrent clients. I would assume that your solution actually reduces the overall cost for clients as it reduces the burden of serialization, but it would be interesting to see this data.

Thank you!

1) I didn't include it in the plot since I found it to remain constant, at ~300 requests/second. But yes, perhaps it would show a nice contrast to batching if graphed along with it.

2) This is an interesting question. I think it depends on how a non-batching solution is implemented: A) if each client obtains its own file handle, fsync()s can theoretically be issued 'slightly more concurrently'; B) if a unique file handle is the only one upon which fsync()s are called, this is essentially my solution with a dedicated thread, but which only writes one update at a time.

For A I'd guess a client would have to wait less for not-very concurrent environments, while in B wait time grows longer for "unlucky" concurrent threads. But I think a batching solution would overtake A for even moderately concurrent environment, since -- as you say -- it reduces the burden of serialisation.

Given time I'll try to revisit the code and try out different kinds of tests!

Great blog post from the perspective of an educated enthusiast feeling their way around and trying to take advantage of new hardware to solve old problems.

For those that are interested in the intersection of databases and ssd's take a look at http://rethinkdb.com/ . They have been covered here before, http://news.ycombinator.com/item?id=1235545 .

A very approachable and enjoyable read. How close to "production quality" is the author's system?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact