
Beringei: A high-performance time series storage engine - gricardo99
https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/?utm_source=codedot_rss_feed&utm_medium=rss&utm_campaign=RSS+Feed
======
crescentfresh
Ok so it's an inmemory product, sharded no less.

They speak about compressing the data before "storing" it.

I don't have a lot of experience with inmemory anything, but are we talking
about retaining the compressed format in server memory here? Ie, RAM is your
datastore.

Then, at some point, to serve requests/queries for the data don't you have to
get it "out of" RAM and uncompress it, also an inmemory operation?

Did I get this right?

~~~
storyinmemo
The paper on the algorithm is here:
[http://www.vldb.org/pvldb/vol8/p1816-teller.pdf](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf)

Somebody implemented the algorithm in go based on the paper here:
[https://github.com/dgryski/go-tsz](https://github.com/dgryski/go-tsz)

A short answer that may work for your question: the bits that are set in RAM
are xor values relative to previous values. To provide an answer as to what
the value is, a series of read|xor operations are performed.

~~~
jdonaldson
They refer to it as "delta of delta", which implies the encoding handles
things like acceleration and momentum. I'm guessing the healthcare data has
much more of this sort of behavior than your typical server event time series.

~~~
Scaevolus
Not quite. Delta of deltas means that periodic timestamps turn into a long run
of zeros (since the difference between each point is constant), which is very
easy to compress. In the Gorilla paper, 96% of timestamps compress down to
zero, which is encoded by a single zero bit.

For series without periodic sampling, delta-of-deltas performs similarly to
normal deltas.

------
irfansharif
general note on when companies open source non-trivial projects but squash all
pre-release commits, not being able to dig though implementation history and
change logs make it difficult to jump in despite the code being publicly
visible.

Context: initial commit with 277,989 additions [0].

[0] -
[https://github.com/facebookincubator/beringei/commit/17a6c2d...](https://github.com/facebookincubator/beringei/commit/17a6c2d44783ddbef68cdc44619ef824d63215e8)

------
cocoflunchy
What would HN suggest to store about 1GB of data per day, mostly for archiving
and offline analysis, with less than 10 columns including timestamp? We're
currently writing everything to our postgres DB and flushing the table to S3
every few days but it's killing the app performance under high loads. I'm
looking for something that is easy to set up and keep running with low to no
maintenance.

~~~
pdq
CSV

Same way most web servers log traffic.

~~~
Mahn
Or perhaps SQLite, since it should perform similar to writing to a CSV file
and you get querying out of the box afterwards.

~~~
saosebastiao
SQLite is a major powerhouse for stuff like this, and if your analysis is on a
single computer with a single disk, analysis is almost always faster than any
sort of highly parallel big data setup. I've loaded, analyzed, and reported on
2TB of data using SQLite in less time than it took for an 8 machine spark
cluster took to load the data.

------
ibotty
I am very interested in a comparison by prometheus developers. Brian Brazil
recently wrote about optimization to prometheus' time series storage.

------
bluecarbuncle
Prometheus' implementation of the same:

* [https://prometheus.io/docs/operating/storage/#chunk-encoding](https://prometheus.io/docs/operating/storage/#chunk-encoding)

* [https://github.com/prometheus/prometheus/blob/127332c56f85b7...](https://github.com/prometheus/prometheus/blob/127332c56f85b71af24cb87bea1fe7a72e60899a/storage/local/chunk/varbit.go#L37)

------
kvz
How would this stack up against InfluxDB?

------
erichocean
We're getting good results from [http://traildb.io/](http://traildb.io/) and
we don't have to grant Facebook a worldwide, royalty-free license to our
patent pool in order to use it.

~~~
perlgeek
[https://github.com/facebookincubator/beringei/blob/master/LI...](https://github.com/facebookincubator/beringei/blob/master/LICENSE)
doesn't even mention patents.

[https://github.com/facebookincubator/beringei#license](https://github.com/facebookincubator/beringei#license)
says "We also provide an additional patent grant."

None of this sounds to me like _you_ have to grant Facebook any patents to use
Beringei.

~~~
ivank
[https://github.com/facebookincubator/beringei/blob/master/PA...](https://github.com/facebookincubator/beringei/blob/master/PATENTS#L14-L26)

