
LevelDB: SSTable and Log-Structured Storage (2012) - jxub
https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
======
VeejayRampay
LevelDB is really cool but I have to admit I still struggle with the variety
of datastores where the design of your keys have such a huge influence down
the line. I remember using the Node variety (levelup/leveldown) and seeing all
kinds of weird patterns when dealing with ranges (using \xff as a marker for
the end of the range) and how to access "groups" of related data.

How simple it is to setup and get going is still extremely appealing though.

~~~
lootsauce
Agreed. Reaching back into my memory banks. When bigtable first became
available for app engine I was fascinated by the prospect of working with
Google scale capable database. But there were so many odd hard to reason about
vagaries of squeezing that performance out of the database by means of key
structure. The developer ergonomics seemed simple at first but in the end were
not at all what I expected.

~~~
ddorian43
.. is that magic that you want in any other product ? the `simplest` I know
are redis commands that show the taken.

~~~
foobarchu
To me, it's not about magic, it's about the abstraction. Most implementations
of this style of database _demand_ that you consider internals of the engine
before storing anything. If you don't design your keys the right way, then
you're screwed down the road, and there's no way to fix it without just
copying the data to a new table (and losing metadata like TTL). When I use
something like Redis, or any SQL databse, I don't need to consider the inner
workings of it unless I'm trying to absolutely max out my performance (one
slight exception being atomicity of Redis commands).

But with LSM style databases, you are going to have a _really_ bad time if you
don't have a team member who has dedicated serious time to understanding the
internal workings of LSM itself, and the details of your chosen
implementation. That's a real mark against it, IMO.

tldr; LSM databases are like a colander in the world of leaky abstractions.

~~~
ddorian43
There is MyRocks, you work like usual mysql. When you want distrubted-db, yeah
you need to design. But comparing RocksDB vs LMDB, the second is easier but
not by much ?

Think citusdb without consideration you're gonna expect good performance in
sharded environment?

~~~
foobarchu
> When you want distrubted-db, yeah you need to design

But one shouldn't be designing for implementation details. Usually, we start
technologies out with leaky abstractions, and gradually get better at it. A
good example is game development, where it used to be that you always used the
drawing method of the display and the clock speed of the cpu to your
advantage. Nowadays, we've moved past that, because it was working on horrible
abstractions, and because the technology underneath improved.

I'm not saying I have a solution, and I agree that this problem rears it's
head the most when you start bringing in distributed storage. But my point
stands: these databases run on a highly leaky abstraction, and that's a big
problem going forward.

~~~
ddorian43
Cost of message passing. It's probably cause you don't care `enough` for
performance.

See the difference in 1 box of `1 process per core`
`scylladb`,`voltdb`,`redis` compared to all other dbs `1-process-for-all-
cores`.

------
elvinyung
Needs a (2012).

IMO, while still cool, LSM trees don't feel particularly novel at this point.
It seems like every database and their dog has adopted/made available some
kind of LSM tree storage engine, down to traditional relational databases
(e.g. MySQL with MyRocks).

~~~
saghm
To be fair, RocksDB is a fork of LevelDB, so at least from a historical
perspective this is fairly related to MyRocks.

------
wessorh
why is this here, from 2012?

~~~
jxub
I just wanted to point out that there is a good explanation of SSTables
B-trees and other database structures in a whoke chapter of the great book
Designing Data-Intensive Applications, by Martin Klepmann.

~~~
jbreiding
In the middle of this book right now, specifically the unreliable clocks part.
The explanations have the right amount of depth. It's a great read in so far.

~~~
suj1th
not to mention, some awesome references at the end of each chapter. I had made
it a point after each chapter to randomly pick an interesting reference and
read that as well.

