Hacker News new | past | comments | ask | show | jobs | submit login

I'm jealous you got to hear Dr. Hipp, that sounds cool. Would love to hear more about the circumstances :)

Regarding the LSM engine, you can find all the relevant implementation details here: https://sqlite.org/src4/doc/trunk/www/lsm.wiki#summary

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system. Under normal circumstances, the in-memory tree is not allowed to grow very large.




I stumbled on this video presentation by Dr Hipp recently (may have been from another HN comment). I really enjoyed it. Probably because he seems so enthusiastic and passionate.

SQLite: The Database at the Edge of the Network with Dr. Richard Hipp

https://m.youtube.com/watch?v=Jib2AmRb_rk


I got lucky! It was a class event, and he's known to speak at databases / data systems classes. He's a very fun (and opinionated!) speaker.

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system.

Hmm, ok, so this contradicts my assumption. Actually, now that I think about it, other LSMs like rocksdb / leveldb work like this too (library-like model with some in-memory component when you "open" the database).

Anyway, without diving into the details of the code, there's other technical decisions that would affect this stuff.

One thing is how big your in-memory structure is (in relation to available memory and insertion workload) and how often you flush to disk is a key thing. Another thing is what your LSM tree looks like - aside from the data structure, how many tiers/levels you have is a big thing. I assume some of these are configurable parameters. E.g. rocksdb has an enormous set of parameters that handles this stuff. It's also annoying to tune.

I found this benchmark here that is illustrative: https://sqlite.org/src4/doc/trunk/www/lsmperf.wiki

The first graph is underwhelming, but when you adjust the buffers (look at the last graph) ~250k writes / second constant regardless of database size (this is why you want an LSM tree) is darned good! And this is on a spinny drive, not an SSD. Their "large" buffer sizes aren't even that large IMHO.

So maybe his mention that the LSM storage was underwhelming was overblown :-) I don't know.

Another difference is with other LSM-based systems that aren't just key-value, it's usually in the context of column stores: you keep a separate LSM for each column family (could be 1-n columns). But I can't think off the top of my head how this would cause a difference. Perhaps in how reads happen - the query engines work quite differently.

Anyway, my talk is cheap, I'm just guessing here, actually doing the analysis is the hard work :-) Also, I'm something of an amateur currently, so take my words with a grain of salt. Anyone else have any ideas re: this?


This is the other perf graph: https://sqlite.org/src4/wiki?name=LsmPerformance


What would be performance for SQLite3 in comparable scenarios? I don't see anything comparative on that benchmarks page.


I've gotten it to do around 40-50k inserts / sec but that's a different scenario - nfs drive, different table and indexes, different queries, different configuration, etc etc. Also I didn't know if he meant that the inserts were disappointing or the overall results were (e.g. a suite of tests including reads / writes of all sorts).


I just want to hijack this thread to say hipps other creation Fossil SCM is a great SCM. Better than git imo. Everyone should check it out.


We've been using it for the last 3/4 years. It's great and really user friendly, with integrated help and a web interface - everything in a single binary. The trouble with it is that it gets slower once you have a lot of history and many files. I don't know if you can use it for huge things like the Linux kernel or the FreeBSD ports tree. I once tried to import the ports tree into fossil and gave up after 2G and an hour. It will import anything that can do git fast-export. Now it also imports svn dumps as well. Fossil is a very good replacement for SVN. You can set up a central repo where everyone syncs on commit and update.


To be fair, Fossil's intended use case is the exact opposite of the Linux kernel. See 3.3 and 3.4 of the fossil vs git page[0].

[0] https://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.w...


I find most of the differences listed there contrived.

One big difference is that fossil includes wiki and ticketing.

Philosophy differs: Fossil intentionally limits "distributedness". For example, fossil push/pull always transfers all branches with their name. Private branches are considered an anti-feature.

Minor differences are the licence (GPL vs BSD) and the data store (files vs sqlite). Under some circumstances these details matter, but not for the majority of developers.

The rest is not significant, imho. For example, "Lots of little tools vs stand-alone executable". Who cares? In both cases you type "$VCS $SUBCOMMAND $PARAMETERS" commands.


You're right, philosophy differs. I generally dislike private branches. It goes back to the origin of git -- intended for the linux kernel, one of the most widely used open source projects with tens of thousands of contributors. Linus doesn't want or need to see a million private branches. None of the projects I work on are of that scale. When your team is under a dozen people, being able to see what your coworkers are playing with in real time (autosync) is actually incredibly useful.

Stand-alone executable is pretty significant. Git is available on most servers -- fossil is not. If it's packaged in your OS, it's often outdated. Stand-alone kinda makes up for this as you can easily get the latest version with a wget & chmod on any computer, on all 3 platforms.

As for sqlite, it is an astoundingly solid rdbms that is well battle-tested. I consider that a big difference.


Why does everybody persist in calling the great man Dr.?

He refers to himself as D. Richard Hipp.


D is an initial but Dr. D. Richard Hipp has a PhD from Duke - graduated in 1992 (his thesis can be found here - its worth a read: http://bit.ly/2ygiDWx ) hence the Dr.


Thank you. Always thought it was a weird comprehension error from the slightly unusual character-combo.

The weird error may have been mine.


Ooops!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: