
BTrDB: Berkeley Tree Database - tectonic
http://btrdb.io/
======
lifeisstillgood
As someone who generally finds "couple million rows? Sqlite is _fine_ " works
as a philosophy, what generally should I know here?

I get that there are huge timeseries based data flows going round the world
(transactions, metrics, weather, self driving telemetry) and sticking those
into something is useful.

I understood the world has settled on "use hdfs to store the data, use apache
spark to run over those files and reduce your query data set till it is small
enough to fit into pandas and then process"

This looks like it is trying to cut out the first two step?

~~~
gopalv
> what generally should I know here?

The "real useful" part was a click further in for me.

[https://github.com/BTrDB/smartgridstore/blob/master/tools/c3...](https://github.com/BTrDB/smartgridstore/blob/master/tools/c37ingress/pmu.go#L204)

Ingress endpoints which is usually the more annoying component, rather than
the exact storage format.

> As someone who generally finds "couple million rows? Sqlite is fine

There's still unsolved annoyances with analog timeseries measurements, which
aren't easily solved in SQL with a columns + btree index.

So combining an airplane altitude sensor at 0.1Hz with a 10Hz fuel sensor,
starts giving you trouble in SQL because the question someone might ask is
"when the plane was at 31,000 feet, at 0.85 mach, what was the fuel rate",
while the plane might have gone from 30,999 to 31,001 feet without ever
recording the intermediate point.

Indirectly speaking, you can do range-scans for each measurement if you
consider time as something that is different from a generic column value in a
generic SQL engine (to join rows by temporal proximity).

The ASF proposal from IoTDB[1] might make interesting reading on the topic
(the frequencies part) or the Boeing talk from XLDB 2018 [2]

[1] -
[https://wiki.apache.org/incubator/IoTDBProposal](https://wiki.apache.org/incubator/IoTDBProposal)

[2] -
[https://conf.slac.stanford.edu/xldb2018/sites/xldb2018.conf....](https://conf.slac.stanford.edu/xldb2018/sites/xldb2018.conf.slac.stanford.edu/files/Tues_15.50%20Ian%20Willson%20XLDB%202018%20Boeing%20IOT.pdf)

~~~
SahAssar
> So combining an airplane altitude sensor at 0.1Hz with a 10Hz fuel sensor,
> starts giving you trouble in SQL because the question someone might ask is
> "when the plane was at 31,000 feet, at 0.85 mach, what was the fuel rate",
> while the plane might have gone from 30,999 to 31,001 feet without ever
> recording the intermediate point.

I thought that was part of the problem that postgres range types solves?
[https://www.postgresql.org/docs/current/rangetypes.html](https://www.postgresql.org/docs/current/rangetypes.html)

~~~
dboreham
Type is one thing. Efficient querying (which implies some appropriate index
structure) is another.

~~~
asah
Weak example to ask _precisely_ when the plane is at X feet, not an inch
higher/lower - this is more typically a range query, and the Postgres range
datatype is quite good at this, notably GiST indexes for overlap queries:
[https://www.postgresql.org/docs/9.3/functions-
range.html#RAN...](https://www.postgresql.org/docs/9.3/functions-
range.html#RANGE-OPERATORS-TABLE)
[https://www.postgresql.org/docs/current/rangetypes.html#RANG...](https://www.postgresql.org/docs/current/rangetypes.html#RANGETYPES-
INDEXING)

------
reitzensteinm
The nodes of the tree resemble Clojure's HAMTs, just skipping the hash. It's
interesting, because Clojure itself opts for RB trees when building sorted
maps, but for keys that can be cast to a bit sequence (strings, integers),
AMTs are a pretty good option.

My side project is in some respects quite close to BTrDB; tree based
analytics. Stumbling across it made my day, because it appears to have
vindicated some design choices I was unsure about.

I've reimplemented Clojure's data structures in Clojure, and made the nodes
serialize to a garbage collected KV backend (ardb or Redis depending on
requirements). This means that they can persist between program runs, be sent
over a network, and don't need to fit in memory. Using structural sharing, you
can take a big hash map, add a single key to it, and have both the old and the
new side by side.

The next piece is cached map/filter/reduce, which saves intermediate results
on each node, making batch jobs incrementally compute. You can take a big hash
map on disk (my main one contains ~20gb, a map of string->set of data points),
add some more values to it, and recalculate the analytics of the new map in a
few milliseconds, having both side by side.

The ultimate goal is to enable you to write more or less idiomatic Clojure
programs that behave the same whether you're reducing a map with a thousand
entries, or a trillion (e.g. by firing up a bunch of AWS Lambda instances).

~~~
lifeisstillgood
This sounds fascinating - is your side project open anywhere ?

~~~
reitzensteinm
Not yet. I'm using it in prod for my own analytics, so I know it works, but
it'll need some clean up and features before release. I do plan on it though.

------
roskilli
Those interested in TSDBs might also be interested in M3DB.io which is Apache
2 licensed, supports a modified TSZ compression implementation with 11x
compression ratio, replication of data in the cluster, streaming of data
between nodes on add/remove and a Kubernetes operator.

~~~
sa46
The compression technique to achieve the 11x compression ratio is pretty
clever and is detailed in the Gorilla TSDB paper. Timestamps are encoded with
a delta of delta encoding scheme which allows 96% of all timestamps to
compress to a single bit within a compressed block.

Gorilla paper:
[https://cs.brown.edu/courses/csci2270/papers/gorilla.pdf](https://cs.brown.edu/courses/csci2270/papers/gorilla.pdf)

~~~
roskilli
It is definitely a huge step forward for time series data collection at scale,
great props to the authors.

With M3TSZ vs vanilla TSZ the major difference is that instead of XORing the
value component of each datapoint it is determined if the precision of the
float value is within a few significant digits and if so, it is turned into an
int representation with delta-of-delta of the int value from value to value
used to represent the different changes in value rather than just XORing the
float bits. Also tracking how many significant digits and dampening the need
to encode a change in that from value to value is also performed.

We should write up the specific algorithm differences for the value component
encoding.

You can see the relative code here in "writeIntVal":
[https://github.com/m3db/m3/blob/b25e111aab06fb7cde9f7e1be8b6...](https://github.com/m3db/m3/blob/b25e111aab06fb7cde9f7e1be8b651946c21f341/src/dbnode/encoding/m3tsz/encoder.go#L198-L229)

------
galacticdessert
Unfortunately the website is a bit light on details, but I found some
interesting information elsewhere:

[https://blog.acolyer.org/2016/05/04/btrdb-optimizing-
storage...](https://blog.acolyer.org/2016/05/04/btrdb-optimizing-storage-
system-design-for-timeseries-processing/)

[https://pingthings.io/platform.html](https://pingthings.io/platform.html)

It look very interesting, and I am a bit surprised to see how a tree structure
can deliver such high performances in retrieving the data. Querying a long
period of time mean that each top node has to be walked to the bottom, right?
Isn't that fairly expensive?

The aggregations on node level are also interesting. It is something that also
the company I work for ([https://www.energy21.com](https://www.energy21.com))
does for calculations, but we use a relational database and a notification-
based system for performing the recalculations and keeping the aggregations up
to date.

Cool stuff.

------
someguy13
You should also look at pingthings.io, which is a commercialization of BTrDB

~~~
jedisct1
Where is the pingthings source code, since BTrDB is GPL?

~~~
bin0
[https://github.com/PingThingsIO](https://github.com/PingThingsIO)

Literally result number three for "pingthings.io source code"

------
bin0
I read about the kdb+ database from a post the other day as "the fastest"
database, and the numbers given did seem incredibly impressive. How does this
compare?

~~~
marknadal
I wish everybody would just follow Redis' setup for baseline tests:

[https://redis.io/topics/benchmarks](https://redis.io/topics/benchmarks)

    
    
      > This is an example of running the benchmark in a MacBook Air 11" using a pipelining of 16 commands:
      $ redis-benchmark -n 1000000 -t set,get -P 16 -q
      SET: 403063.28 requests per second
      GET: 508388.41 requests per second
    

Testing against a Macbook Air is a reasonable, as most developers will have
access to one or a windows machine of similar stats.

For instance, I have a 5 year old Macbook Air, and I run this test to
benchmark (baseline) my database:

[https://htmlpreview.github.io/?https://github.com/amark/gun/...](https://htmlpreview.github.io/?https://github.com/amark/gun/blob/master/test/ptsd/ptsd.html)

My / Redis test is not very interesting (not real-world use case), but it at
least gives a baseline that removes a lot of variables.

Then every database should implement Jepsen-like tests for everything else.
That is what we're trying to do, with creating a Jepsen-like distributed
systems correctness ([https://github.com/gundb/panic-
server](https://github.com/gundb/panic-server)) and load testing tool in a
language that is easy for any developer to access and use.

~~~
pritambaral
> ... as most developers will have access to one or a windows machine of
> similar stats.

That's a bold assumption. Looking around my office in India here, I find not a
single Macbook, a Windows machine, or any machine from '11\. It would also be
trouble to get my hands on a Macbook Air '11, if I had to do it for the
purposes of reproducing a benchmark. I certainly couldn't buy one today, and
even if I were to buy a currently-in-stores Macbook it'd be awfully expensive.

A much better proposition would be something far more ubiquitous and
accessible, something that can be borrowed trivially or bought easily by the
vast majority of people, something that stays stable and available for a
number of years. An AWS instance or a Raspberry Pi (1/2/3), for instance.

~~~
marknadal
Thank you for calling this out. It was the spirit of what I was trying to go
for, because the website use "60GB of RAM (EC2 c3.8xlarge)" which I wouldn't
even pay for.

I totally agree with the RasPi suggestion. Though note, the RasPi 3 B+ is
actually about as powerful as my 5 year old air, 1.4GHz quadcore vs 1.6GHz
quadcore.

~~~
pritambaral
> 60GB of RAM (EC2 c3.8xlarge)" which I wouldn't even pay for.

The point isn't to restrict benchmarks to only high-end machines. The point is
to use a _stable standard_ that people wouldn't have trouble running for a few
hours. I can rent a machine of that identical type for 20 hours, or buy an
RPi, for the low price of $35.

> about as powerful as my 5 year old air, 1.4GHz quadcore vs 1.6GHz quadcore

Core counts and clock speeds do not translate to comparable computing power
across processor families and instruction sets. E.g., the RPi 4 (1.5GHz,
quadcore) is announced to deliver 3x the performance of an RPi 3B+ (1.4GHz,
quadcore), and that's on the same instruction set.

------
mzs
Berkeley Tree DB uses GPLv3, go figure

[https://github.com/BTrDB/btrdb-
server/blob/master/LICENSE](https://github.com/BTrDB/btrdb-
server/blob/master/LICENSE)

~~~
ksec
So even Berkeley has abandoned BSD? Or is GPLv3 part of the U.S. Department of
Energy ARPA-E program (DE-AR0000430), National Science Foundation CPS-1239552
requirement?

~~~
cat199
'Berkeley' never used only the BSD license.

BSD UNIX used the BSD license (among other things).

There is/were/always will be many projects, departments, professors, etc.
doing things needing license at UCB, and each will choose the appropriate one
for their circumstances.

------
fnord123
Is this in memory like bdb? It says it uses http and capnproto which would
indicate that it's not in memory.

~~~
kentonv
Looks like they switched to gRPC and removed Cap'n Proto in late 2016 / early
2017, but never updated the web page.

[https://github.com/BTrDB/btrdb-
server/commit/23fda10224bea80...](https://github.com/BTrDB/btrdb-
server/commit/23fda10224bea80dfe397d0fa061009972b25f1e)

[https://github.com/BTrDB/btrdb-
server/commit/7bb3c8d39b0954e...](https://github.com/BTrDB/btrdb-
server/commit/7bb3c8d39b0954e92021043345c12c9f3e4bb8d6)

I'm unable to find any public discussion about this change or its motivations.
Oh well.

~~~
fnord123
That's an interesting choice for a system intended to be high performance.
Go's GRPC implementation has been suffering from extremely high resource usage
due to [https://github.com/grpc/grpc-
go/issues/1455](https://github.com/grpc/grpc-go/issues/1455)

------
sushilari
Are use case driven DBs helpful? Is the use case and possible adoption crowd
large enough to sustain a product this critical down the road?

I worry about the proliferation of infra level tools or adopting a new one.
Why not make an extension or fork postgres to demonstrate your concept? (Just
an example)

~~~
unchurched
Timeseries databases have an entirely different set of technical challenges
from relational databases. Also, I wouldn’t exactly call this proliferation.
There are relatively few decent timeseries databases in existence, some of
which have been around since 1999.

So yes, timeseries databases are incredibly helpful if you have to store,
well, large amounts of timeseries data quickly.

------
ashish01
> precomputed statistics over power-of-two-aligned time ranges

Simple google search didn't get much details. Can someone please add some
details or links for this?

~~~
hinkley
Does this blurb from the about page cover you?

> BTrDB uses a K-ary tree to store timeseries data. The leaves of the tree
> store the individual time-value pairs. Each internal node stores associative
> statistics of the data in its subtree; currently, the statistics are the
> minimum value, mean value, maximum value, and number of data points. BTrDB
> uses the internal nodes to accelerate processing of these statistical
> aggregates over arbitrary time ranges.

~~~
oofabz
So basically, a high tech RRDtool?

~~~
hinkley
I wasn’t satisfied by the density of information on that site. In fact that
paragraph was the closest hung to information I found on the whole site.

Pretty much the sort of thing my coworkers would try to pass off as
“documentation” and someone else would have to do it over for them...

------
marcrosoft
Please change the logo. It is an obvious rip off of sequel pro.

~~~
bourgoin
It's the same meme - a stack of pancakes as a visual pun on "software
stack"/"full stack." Not the only place I've seen that before, either - I've
seen the same idea on T-shirts and laptop stickers.

Funnily enough, I went to sequelpro.com/legal to find that while their
software is under the MIT license, as opposed to the more restrictive GPL
license used by BTrDB, the website specifically calls out the icon as property
of the project and its use is disallowed in forks. So, it looks like they
really would care more about the logo being ripped off more than anything.

