
Chronix: A fast and efficient time series storage - based2
http://chronix.io/#features
======
jrv
A couple of us from Prometheus met with Josef from Chronix last week. I
haven't tried Chronix yet, but it sounds great for what they're doing. It's
currently used for industrial manufacturing monitoring (think car production
lines), with a focus on offline anomaly detection. However, there are no live
queries yet - it's more about offline number crunching and storing all data
points forever. They are also Prometheus users, so they're interested in
having Prometheus as a live monitoring system and feeding long-term data into
Chronix once the interfaces for that in Prometheus become available.

~~~
bluecmd
I didn't even have to go to IRC and ask you guys about this, now that's
service! ;-).

Exciting!

------
pstuart
A comparison to InfluxDB would be interesting.

~~~
f_l_a
There are some publications including a comparison of the storage efficiency,
memory footprint, and query runtimes. For example:

[https://speakerdeck.com/florian_lautenschlager/the-new-
time-...](https://speakerdeck.com/florian_lautenschlager/the-new-time-series-
kid-on-the-block?slide=24)

Database versions: InfluxDB (v. 0.10.3-1) Graphite (v.0.9.15) OpenTSDB (v.
2.2.0) KairosDB (v. 1.1.1)

------
ronnier
If you are interested in time series storage, I'm keeping a list of
interesting ones
[http://ronnieroller.com/metrics](http://ronnieroller.com/metrics)

Would be happy to hear of ones I might be missing.

------
jaytaylor
I'd be interested to know what drove the technology choices for Java and
Apache Solr.

~~~
mtrn
For one, Lucene[1] is quite amazing. With a slight stretch of imagination, you
can think of it as a document oriented database, with _all_ fields indexed by
default. Imagine loading 100M rows with 100 columns into an RDBMS and then
index _each_ column[2]. You don't necessary do that. With Lucene - and SOLR
and elasticsearch for that matter - it's not even a thought. Additionally, you
get to tweak your indexing with dozens of field-tested analyzers.

[1] [http://lucene.apache.org/core/](http://lucene.apache.org/core/)

[2] we do this (over 300G of complex data) with a tool called solrbulk and on
a single server in less than four hours,
[https://github.com/miku/solrbulk](https://github.com/miku/solrbulk)

~~~
jaytaylor
I agree that Lucene is a mature solution with many excellent use-cases.

For this portion of the stack why would it be preferable to start with Lucene
rather than Elasticsearch? (for all of the distributed scaling benefits)

~~~
mtrn
I would probably start with elasticsearch and establish a business case first.
Only once that's proven and some need for lower level customizations arises, I
would descend the stack.

------
iamcreasy
What's a "time series storage"?

~~~
dingaling
Time-series data is chronological and immutable and which has time as a
primary dimension; think about the outputs of a weather-monitoring station for
example which streams air pressure, wind speed and direction and temperature
at ( say ) one-second intervals.

You can stuff all that into a standard RDBMS with appropriate schema design
but there's a lot of wasted overhead; you're not going to execute updates, for
example, and there can probably be a lot of smart compression achievable for
long series of unchanging data, such as air pressure which might remain
constant for hours and then change gradually.

~~

A slightly more obscure example is ADS-B data streamed from aircraft. I did a
logging system and schema a few years back and in order to handle the input
volume it held the raw data on ramdisk and ran summarisation logic before
inserting into an RDBMS. So it persisted the minimum and maximum altitudes of
a particular aircraft in a particular period, for example, but in doing so we
lost data on the intermediate vertical flightpath and rates of change. And
each aircraft had multiple time periods for each encounter. It rapidly became
quite complicated compared to the use a time-series store where the data would
be stored in sequence and the complicated logic is moved to the offline query
stage rather than the up-front storage pre-processing stage.

