
How does SQLite work? Part 2: btrees - todsacerdoti
https://jvns.ca/blog/2014/10/02/how-does-sqlite-work-part-2-btrees/
======
sgummaluri
Part one of this[0] is discussed here[1]

[0][https://jvns.ca/blog/2014/09/27/how-does-sqlite-work-
part-1-...](https://jvns.ca/blog/2014/09/27/how-does-sqlite-work-
part-1-pages/)

[1][https://news.ycombinator.com/item?id=23663071](https://news.ycombinator.com/item?id=23663071)

------
fnord123
Reminder: when anyone says B-Tree in the context of databases they mean
B+Tree.

And I'll never understand how efficient B+Tree implementations are not in the
standard library of every language/runtime.

~~~
BiteCode_dev
I can think of 3 main use cases for them:

\- databases

\- file systems

\- hash maps

The first 2 are not something you code everyday, so if you go for that,
installing a dep doing it is not a problem.

The last is usually included in all languages.

Why would you need them available in all languages for?

~~~
fnord123
They're very good all around structures when you want to store anything. i.e.
good for any container; just requires an orderable key (or one that can
support `contains` and then use a GiST). If the key is orderable then you can
iterate over spans with better cache locality.

I think they're only used for databases and file systems because those are
cases where people actually bother to write them since they care about
performance. If they were in the stdlib then they'd be used for many more
things.

~~~
BiteCode_dev
I get the point, but I rarely find myself in the case I even need an _ordered_
hash map. Despite the fact my language provide one by default.

I can't remember one time in the last decade where I was in need of a _sorted_
hash map.

And if it ever happen I can easily depend install one.

I'm unsure of having something even more general like a b+tree would help.

It's possible that, because I'm not use to have them handy, I don't think
about them when I encounter a problem for which they would be perfect.

But I never felt like I missed them.

Can you give me a few examples of problems you them for in the past year?

Was that with a high or low level language?

~~~
fnord123
Let's get all the records from 2020-06-28T10:00:00.000Z to
2020-06-29T09:59:59.999Z and sum the quantity field. Or anywhere you would
want to group_by. Or anywhere you might use something like an RDD that
partitions the data to compute something for each chunk of data (in parallel)
(also after a group_by).

Also, sparse vectors. You don't have a value to store for each index so you
use a BTreeMap. I've done this using sentinel values in an array, but that's
gross depending on how sparse and how large the vector is.

In Python I would just use a Dataframe (pandas), but in Rust I would use
BTreeMap. I wouldn't wish for all languages to have a dataframe type in them
since they're such complicated beasts with tricky APIs (pandas and R are such
acquired tastes) so I think people can get quite far with B(+)Trees. So that's
what I wish was in every stdlib.

~~~
BiteCode_dev
I don't know, wouldn't dumping the data into a db easier? If it fits on
sqlite, you already have a solution in most languages to this problem.

Technically you would be using a btree, but a higher level interface, like
with a fs or hashmap. Which is kinda my point.

I dont say it's not a good DS, but having it in all languages stdlib seems
overkill given the tool we have today.

I'm a python guy, and even without panda it has the heapq module which is a
b-tree, yet I never really got a reason to use it.

~~~
fnord123
Heapq is a priority queue implemented on a binary tree; not a btree.

> I dont say it's not a good DS, but having it in all languages stdlib seems
> overkill given the tool we have today.

C++ has map and Java has TreeMap. These are both red black trees (which are
equivalent to btrees where k=4). So there's no question of whether there are
ordered mapping types in these languages. IMO they'd be better as b(+)rees
with a wider fanout. That's all.

