
A Fast Lightweight Time-Series Store for IoT Data - rch
http://arxiv.org/abs/1605.01435
======
chris_va
I was a little disappointed to see that this was just a single-machine
hierarchical time index on top of a pre-sorted append log.

It's not a bad architecture, and the lockless queuing system and shared memory
is cute, but:

\- It _only_ supports time indexing. Querying by other fields (e.g. if you
wanted to build a time histogram of when you saw a particular event) requires
reading the entire dataset.

\- It doesn't address replication/distributed indexing, which seems like a
must.

\- The use of calendar time over micro-timestamps also needlessly complicates
things, though presumably it makes their queries more efficient (assuming
people query by discreet time buckets like "yesterday").

------
halayli
with the risk of sounding cynical here, but it feels this is a solution from
people that aren't familiar with what the problem is about, ending up in
optimizing the wrong areas.

Few observations here:

* using udp means you don't care if you lose data and that's bad. if your buffers are full your data will be silently dropped, aside from network glitches and metric sizes that can easily exceed 512 bytes. Your buffers can get full when you wait on disk io etc..

Optimizing your network stack when your bottleneck is disk io doesn't buy you
much. But the real issue that they haven't tackled is data distribution. If
you can horizontally scale your data optimally, then when you hit a bottleneck
(whatever it is) you can always increase your cluster size and solve your
bottlenecks. In the current solution, If the machine dies, you will lose tons
of data.

cow pipelining exist for a very long time now. There's nothing new in what
they are introducing.

The persistent data they hold in memory is not guaranteed to be recovered if
they crash before an fsync/msync.

Secondary index on time column is not enough. Queries often need particular
keys only and you'll end up doing full table scans.

~~~
yeukhon
I think udp vs tcp is debatable depending on the kind of data. If you are
collecting continuous telemetry, losing a few may not be end of the world. Of
course, YMMD so you should pick a database based on your objective.

> Optimizing your network stack when your bottleneck is disk io doesn't buy
> you much.

Agree but also the famous "depends." If you are collecting A LOT of data over
the network, first makes sure your network card is capable of handling the
incoming traffic. An anology would be a 100M vs 1G port.

Also, if you are running on embedded system device like RPie or Arundio, I
think disk will definitely be the first bottleneck before you even have a
chance to respond to your network saturation.

------
TheGuyWhoCodes
I'm surprised Cassandra isn't mentioned. It's not specifically made for time
series and there are some tricks to make it work nicely (bucketing) but it's a
solid solution.

~~~
throwaway_exer
Cassandra is not a solid solution.

After problems with a node, often the management tools cause additional
problems.

I suggest anybody follow the cassandra-users list for a while before
committing to it.

~~~
statictype
Are you saying Cassandra is not suitable for time series data or that it has
issues in general as a database engine?

------
tluyben2
Incidentally I was reading an old HN post today on Kdb+ for timeseries where
people who use that commercial db said it's amazingly fast (I only played
around with it, nothing speedy). People in that discussion mentioned that 300k
writes/s mentioned about Kdb+ is nothing compared to the millions of writes/s
IoT solutions need. So I am wondering what kind of performance you can get on
time series databases on a single machine; I understand that with clustering
systems like Cassandra can do 100 million/s and more (someone mentioned
billions but not what DB), but getting benchmarks just throwing hardware at
the problem, to me, only makes sense if we know the performance on a single
node, multicore performance single node and then the scaling graph when adding
nodes.

I am just interested in this kind of thing; I used to work with time series
for a German telco; that was DB2 on heavy IBM metal for massive amounts of
money and financial work for a startup. I wonder what the state of the art
here is.

~~~
gtfierro
Another comment here mentioned a recent FAST paper talking about BtrDB, which
can get ~16mil writes/sec (nanosecond timestamp, 8 byte values) on a single
node, with near-linear speedup when clustering:
[https://blog.acolyer.org/2016/05/04/btrdb-optimizing-
storage...](https://blog.acolyer.org/2016/05/04/btrdb-optimizing-storage-
system-design-for-timeseries-processing/)

------
rar_ram
Correct me, if I am wrong. InfluxDb is written in Golang and not Java as the
table 1 mentions in the paper.

~~~
minaandrawos
I think you are right

------
bigger_cheese
In the industrial control system world we call time series DBs 'Enterprise
Historians' they are well optimised. Good ones support storing data at native
frequency of control system including sub-second intervals. Basically whatever
frequency the instrument can sample at these things can record.

They also have good compression and buffering support, including catchup
functionality.

They are by no means lightweight - usually they rely on PLC's or SCADA system
for inputs. They may not offer a complete solution for IOT but I'm sure there
are lessons which can be adapted.

------
sqozz
Does anybody know why the paper states that influxdb is implemented in Java?

~~~
jmcgough
Probably accidentally copied the column from opentsdb, which is right below it

------
qaq
Prometheus InfluxDB opentsdb ... and on and on

It seems it's pretty crowded

~~~
willempienaar
It's crowded because there isn't a clear leader.

~~~
bbrazil
Speaking as a Prometheus developer, each is suited to different things.

For example if you want to store timeseries data long term and already use
HBase then OpenTSDB is a good choice.

On the other hand if you want to do monitoring that's simple and dependable in
an emergency with querying, graphing and alerting over short/medium-term data,
then Prometheus would a good choice.

~~~
pmahoney
> short/medium-term data

This in particular, but your entire post, would be a great addition to the
Prometheus front page, or top of the documentation section. This wasn't clear
to me initially, speaking as someone who evaluated prometheus a few months
ago.

~~~
bbrazil
We plan on seamlessly integrating with long-term storage
([https://prometheus.io/docs/introduction/roadmap/#long-
term-s...](https://prometheus.io/docs/introduction/roadmap/#long-term-
storage)) and OpenTSDB is one option for that for us.

Ignoring that as it's planned work, the typical considerations are more around
availability vs. consistency and that we're a metrics system focused on
operations rather than a event store. Most of that's already covered in our
docs. See [https://prometheus.io/docs/introduction/overview/#when-
does-...](https://prometheus.io/docs/introduction/overview/#when-does-it-fit)
and
[https://prometheus.io/docs/introduction/faq/](https://prometheus.io/docs/introduction/faq/)

------
rodionos
The primary dataset used for testing is "USGS archived earthquake data". Does
anyone have a link to the actual dataset?

------
mianos
These need the traditional YA prefix. Like YACC, YATSD or YATSS?

------
rbolla
druid??

~~~
SEJeff
It isn't a small footprint, but Druid is the best of breed yet most often
underrated tool in this space. Influx gets the press because of the simplicity
in setting it up, but epic failed at clustering by trying to invent their own
multi-raft magic. If you're not analyzing a lot of data Druid doesn't make
sense to setup but if you do, it is simply incredible. I can do analytics on
5+ dimensions with 10 billion datapoints quick enough to make it very
responsive in grafana. Their most recent release looks quite interesting as
well.

[https://groups.google.com/forum/m/#!topic/druid-
user/nqqb5RI...](https://groups.google.com/forum/m/#!topic/druid-
user/nqqb5RIdBbs)

~~~
raarts
Citation for the epic fail?

~~~
SEJeff
They re-did it a few times from scratch, then decided it was hard, and made it
a commercial only feature. It still isn't stable. Also, even with their
previous clustering, each influx node had to have the entire dataset, so
you're limited by the size of data that fits on a single node.

Druid was built from the ground up to be distributed. It means there is some
work to set it up, but once you do, it scales horizontally very easily. Bonus
points if you run it on something like mesos making it quite easy to deal with
(we do).

[https://influxdata.com/blog/update-on-influxdb-clustering-
hi...](https://influxdata.com/blog/update-on-influxdb-clustering-high-
availability-and-monetization/)

[https://influxdata.com/blog/influxdb-clustering-design-
neith...](https://influxdata.com/blog/influxdb-clustering-design-neither-
strictly-cp-or-ap/)

[http://www.refactorium.com/distributed_systems/InfluxDB-
and-...](http://www.refactorium.com/distributed_systems/InfluxDB-and-Jepsen-
Chapter-II-Where-is-influxdb-on-the-cap-scale/)

etc. They redid their clustering a few times and the 3rd time around made it
closed source. Note that $employer also sues influx for data that we are ok
with losing ie: metrics. If the data has a low amount of dimensions, influx is
faster than druids. However, if you have 3+ dimensions, druid spanks the pants
off of influx by nature of distributing the computation better.

