Hacker News new | past | comments | ask | show | jobs | submit login
Dynamically Adjustable Key-Value Store by Combining LSM and COW B+ Tree [pdf] (greensky00.github.io)
90 points by ngaut 10 days ago | hide | past | web | favorite | 14 comments

Looks like the team presented at HotStorage 2019[1] from July with slides from their talk[2].

[1] https://www.usenix.org/conference/hotstorage19/workshop-prog...

[2] https://www.usenix.org/sites/default/files/conference/protec...

Did they benchmark against B+Trees and LSM? I don’t see that in the slides.

Slides 13, 14 and 15 show Jungle's performance using a range of compaction factors (C=[2, 3, 5, 10]) measured against LSM-tree using leveled compaction and LSM variant using tiered (aka size) compaction.

Having said this, the paper itself (in the original link) under Section 4 "Evaluation" describes the differences in more detail, and Figure 6 probably does the best job of showing Jungle benefits in a compact human-readable line of charts.

If I'm reading the paper correctly, and as summarized on slide 16, using the combination of CoW B+ and LSM means that instead of a 3-way tradeoff between Read/Write/Space, Jungle can minimize the cost trade-off such that the only remaining material trade-off is Write/Space.

Pretty cool stuff.

I looked at those slides twice and only saw them as comparing different settings of their algorithms. I dunno if that says more about me or how long PhD students (haven’t) spent with Tufte.

Maybe three colors for the bars.

>>[...] how long PhD students (haven’t) spent with Tufte.

I think this :) I also had to look more closely a few times.

its exciting that things are happening again in db-land. A few years ago it was Fractal Trees, then leveldb arrived and got wider adoption, then Facebook put it in MySQL with MyRocks etc.

Very recently timescaledb did some really really interesting stuff with compressing tables, and I’m wondering if that is useable with non-time-series data too etc?

I’m a heavy tokudb user because of the compression and I’m looking forward to seeing if a b+ lsm with compression is going to turn up in MySQL or even better Postgres.

Is there any reason Postgres or other projects could not adopt tokudb's solution?

Architecturally, MySQL went with a “the storage engine is a plug-in” and Postgres went with a more traditional “the storage is part of the core”.

Postgres has slowly grown some ... tolerance ... for alternative storage engines, via federation and forks like greenplum and timescaledb that put their own engines in, but they are always feeling like second class citizens and hitting integration limits.

What made you pick TokuDB instead of MyRocks?

I got the impression that TokuDB is pretty much dead. Hasn't Percona stopped updating TokuDB and is focusing on MyRocks instead?

I picked tokudb before myrocks existed.

Tokudb runs rings around myrocks. Particularly, it has better read/write perf and much better compression.

But it will disappear by MySQL 9. Tokudb died because it wasn’t widely adopted nor sponsored, not because it was worse tech.

I think the recent timescaledb stuff about compression is really promising. I hope Postgres moves towards built-in compression rather than telling people the fs should do it.

I hope TimescaleDB compression level will eventually reach VictoriaMetrics compression level for typical time series data :) [1]

[1] https://medium.com/@valyala/measuring-vertical-scalability-f...

Is source code available?

How does the performance compare to B-epsilon trees, which are B trees with a write buffer in each node to make writes more efficient.

looks like great candidate for bolt(b+) and badger(lsm)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact