
Show HN: On-disk B+ tree for Python 3 - nicolaslem
https://github.com/NicolasLM/bplustree
======
laurencerowe
ZODB has a mature B+Tree implementation for on disk use in the BTrees package.
[https://pypi.python.org/pypi/BTrees](https://pypi.python.org/pypi/BTrees)

------
dimatura
Nice, I'm a big user of various data stores for scientific work. How does this
compare to LMDB ([http://www.lmdb.tech/doc/](http://www.lmdb.tech/doc/)),
which uses B-trees (and has a Python interface)?

~~~
nicolaslem
I haven't done any benchmark yet but I expect my implementation to be at least
an order of magnitude slower.

I've found my implementation to be CPU intensive: creating Python objects from
the raw pages is expensive. That's why bulk inserts and iterations are much
faster than insert/get in a loop.

------
sprt
The code looks very clean. Uses type annotations too. Wish there were more
comments though.

~~~
nicolaslem
Thank you! I found that type annotations make code easier to read so I tend to
write less comments. But I'll make an effort.

------
erezsh
How does this perform compared to sqlite3?

And and can you use keys that aren't builtin values?

~~~
nicolaslem
It is possible to use your own key if it has a natural order and you write
your own simple serializer:
[https://github.com/NicolasLM/bplustree/blob/master/bplustree...](https://github.com/NicolasLM/bplustree/blob/master/bplustree/serializer.py)

------
antman
Cannot delete items yet

~~~
nicolaslem
Yes, it's a work in progress, see the `remove` branch.

~~~
IvyMike
That is a very dangerous name for a branch.

~~~
liopleurodon
lol, truth

------
j_s
Are there any production data stores recommended for low memory usage? What
should I use to stream data to disk and back with minimal overhead, preferably
with indexed lookups?

~~~
bufferoverflow
An SSD with the highest random-read/write IOPS you can afford. Something like
this:

[https://www.computerworld.com/article/2987956/solid-state-
dr...](https://www.computerworld.com/article/2987956/solid-state-drives/intel-
samsung-launch-their-fastest-ssds-with-up-to-5gbps-speeds.html)

Or FusionIO ioDrive2 Duo, which is above 900K IOPS for both reads and writes.

------
zerokernel
[https://pypi.python.org/pypi/BTrees](https://pypi.python.org/pypi/BTrees)

------
evolighting
what about using a DB? Is there any really simple database for the same
purpose?

------
coleifer
This is pretty cool. As I was reading the code I really wished for an
explanation or simple ASCII diagram of the serialization formats for the
various node/record types, as well as for the frames/wal format. Given that
the poster is the author of the project, I hope you'll consider filling in
these kinds of details, as they'd presumably be of interest to the people
you're "Show"-ing this to.

~~~
nicolaslem
Thank you. I know what you mean, I'm not happy with how the serialization is
done right now, it's too complicated.

Maybe someone on HN knows a Python serialization library beyond pickle that
would allow to describe how the data is laid out and take care of the rest. It
looks like struct is not flexible enough for this usage.

------
coleifer
Thanks for sharing, I've been interested in finding pure python
implementations of storage engines. Another one I came across is whoosh:
[https://bitbucket.org/mchaput/whoosh/src/a16ebacb47191afaf2d...](https://bitbucket.org/mchaput/whoosh/src/a16ebacb47191afaf2d1fda8b1fcd84e6981582f/src/whoosh/filedb/?at=default)

~~~
shalabhc
Check out zodb and durus.

------
halayli
What's wrong with LMDB?

