
InfluxDB has taken its open-source business to Silicon Valley - rhoml
http://technical.ly/brooklyn/2015/03/25/influxdb-paul-dix-moves-to-silicon-valley/
======
shanemhansen
I really want to love influxdb because I think the world needs a better answer
to time series databases that doesn't include java (OpenTSDB, Cassandra). The
underlying storage engine (leveldb/rocksdb) is quite solid. I'm currently
running 3 nodes in production (for collecting stats) and doing a few thousand
writes/s. I'm not using any of the clustering features, I probably won't even
evaluate that until 0.9.

I'm currently running the latest 0.8.x release and there are a few issues:

1.My influxdb instances stop servicing reads once every 12 hours so I have a
cron job that force restarts it.
[https://github.com/influxdb/influxdb/issues/1116](https://github.com/influxdb/influxdb/issues/1116)

2\. Enabling the graphite plugin on the first run can crash the process (the
creation of the default cluster admin user seems to be racy). Not a big deal
except in automated deployment scenarios.

3\. I lost an entire database (luckily it was just used for storing grafana
graph definitions and not actual data).

4\. I'm not sure if anyone's currently working on their admin UI. I submitted
a pull request to their admin UI to sort shards by ID because currently it
randomizes the order on every load (I presume because of golang's randomized
map iteration). It's sat there since January. The last PR they merged into
that repo was in May of 2014.

I really want influxdb to be successful. Every organization I've worked for in
the last few years has serious graphite scaling issues and influxdb is well
positioned to fix those. I think even in it's current state it's a better
option than graphite (and the influxdb-graphite plugin gives you all the
graphite features).

~~~
pauldix
Hi Shane, thanks for the encouragement and sorry you're having a few problems
with the current 0.8.8 release.

We're heads down working on 0.9.0 and won't be doing any more releases in the
0.8.x line (except to create a migration path to 0.9.0). So we are merging
PRs, but only those that apply to 0.9.0 (which includes the admin UI).

------
infinotize
Another InfluxDB user here. I'd done some evaluations with OpenTSDB and the
Graphite suite, and while I had some concerns with stability and maturity the
main things that sold me on it were:

* No dependencies. Compare this with setting up HDFS/HBase and Graphite which is a real pain in the neck to manage, especially since my tsdb has to run on an arbitrary machine pool in a sandbox.

* Active development. This is a big one. Releases have been coming steadily and Paul & co. do a good job of having a real roadmap and chipping away at it; this is probably my tipping point over Graphite.

* Clustering. Maybe it's not there yet, but see above. Most tools in this space are not elastic at all.

* Grafana integration - seems like there is a good bit of momentum in that project in general which is promising.

PS Reading this it almost sounds like an ad, no I'm not affiliated with
influx.

PPS logfile configuration for rotation/cleanup would be a nice-to-have
enhancement ;)

~~~
pauldix
Thanks, we're working hard on getting the clustering features complete so we
have a real answer for HA, failover, and scalability (up to a point based on
current design).

For logfile rotation our recommended solution is to use logrotate. We'll be
updating the install to include a config. See
[https://github.com/influxdb/influxdb/issues/1943](https://github.com/influxdb/influxdb/issues/1943)

------
gtrubetskoy
Incidently I wrote a blog on it last week:
[http://grisha.org/blog/2015/03/20/influxdb-
data/](http://grisha.org/blog/2015/03/20/influxdb-data/)

The site says "production ready in March" \- it seemed to me like there's at
least 3 months of work there given that most of the clustering features (e.g.
how to rebuild a fialed node, how to expand the cluster, distributed queries)
are not there.

My other concern with InfluxDB is that it doesn't follow the fate of
FoundationDB - get acquired by a giant corporation and disappear.

~~~
pauldix
Hi Grisha, I saw that post, thanks for writing it! The coming features you're
talking about are the work we're focused on for finishing this release. The
three you mention should drop in an RC within two weeks.

The distributed queries part isn't a large amount of work beccause of how
we've designed things. Under the covers the query engine already represents
each query as a MapReduce job to be run.

For cluster expansion, work is starting on that today. Again it's just a
matter of wiring some things up. Node replacement is also starting today.

We may miss the March goal but it won't be by anything close to 3 months. Glad
you're paying attention to the project though :)

For the Foundation problem, I thought they were never open source. Just free
for 5 nodes or less, no?

I think the key to avoiding this fate is to build an active community of
contributors outside the company. Luckily we have people submitting PRs every
week. We'll be trying to document more of the code and make it easier for
outsiders to get involved as we go along.

That way if the worst happens, at least the community can fork and keep the
project going forward. I'd love nothing more than for Influx to become bigger
than this company.

~~~
gtrubetskoy
Thanks Paul! So you're saying it's all a SMOP :)

Another thing that I think might be a critical (or at least interesting)
characteristic is back-filling optimization, i.e. when you need to load a
trillion data points of historical data - this y/t explains it pretty well and
talks about how OpenTSDB addresses it:
[https://www.youtube.com/watch?v=SgD3RD2Shg4](https://www.youtube.com/watch?v=SgD3RD2Shg4)

Anyhow - keep up the good work, I very much believe that in the next couple of
years "Time Series" is going to become a resume-must-include buzzword :)

~~~
pauldix
Cool, I'll have to take a look at that talk. We've had people ask about
backfilling large amounts of data so it's something we'll have to figure out.

~~~
gtrubetskoy
Another thing I was curious about is why not do all the clustering/distributed
stuff at the db level, i.e. have some sort of a distributed BoltDB-like/Raft
as a separate layer or even entirely separate project, and then InfluxDB would
be a much thinner/simpler thing. I think that in general the approach of
OpenTSDB and similar things is right, it's just that HBase/Hadoop is a such a
pain to set up and maintain (and so is Cassandra, if perhaps a little less).

~~~
pauldix
One of the key goals of the project is to be able to aggregate and downsample
from raw high precision data. That means we want a framework in which we ship
the code to where the data lives, not the other way around.

The abstractions I've seen that have the database layer and then some services
on top all miss this. They transport all of the raw data over the network and
then run the computations and return the summary ticks back to the user.

Our framework lets us compute the summary ticks locally and send only those
back (is many cases, but not all).

------
CSDude
I am using InfluxDB in my research, to analyze resource utilizations of
running applications and it has been very useful to me since, but I think it
was supposed to be production ready this March. There are some bugs that
occurs sometimes.

~~~
pauldix
We're busy at work on the production ready version. We're targeting March, but
we won't release until it's ready (even if that means slipping our target).

Remember, in software development there are lies, damn lies, and delivery
estimates.

We'll get it out as quickly as possible, sorry for any delays.

~~~
eik3_de
can you say something about the upgrade path, will that be possible to do
live?

~~~
pauldix
We haven't built the migration tool yet, but mostly likely it will involve
running a new version in parallel with the old version while the upgrade runs.

For a guide on how to design your schema for a clean migration see here:
[http://influxdb.com/docs/v0.8/advanced_topics/schema_design....](http://influxdb.com/docs/v0.8/advanced_topics/schema_design.html#migration-
to-0.9.0)

------
erichmond
Great product and smart pivot, I hope they do well. Them blowing up would be
another win for the NYC tech scene (indirectly).

~~~
pauldix
Thanks Eric!

------
jacques_chester
Denver office? Sounds like someone recruited a few Pivotal Labs alumni :)

~~~
pauldix
We have, but sadly not in Denver... yet ;)

------
bad_user
Any comparisons with KairosDB or OpenTSDB?

~~~
iolco51
KairosDB is less popular, smaller, but with different limitations and is more
flexible. It was inspired by the OpenTSDB design but then took a different
path.

OpenTSDB relies on HBase, KairosDB has configurable and pluggable datastore -
but the only production-ready so far is using Cassandra.

OpenTSDB always does interpolation of values for aggregation (which I found to
be an hazardous decision), KairosDB does not really do proper series
"vertical" aggregation (by vertical I mean not downsampling).

OpenTSDB is GPL, KairosDB is Apache 2.0 (that counts for closed-source
integrations).

OpenTSDB supports only numerical data but supports annotations, KairosDB
supports Strings and numerical in baseline but is compatible with any data
type, it does not have annotations (but you may use string for that).

On their baseline OpenTSDB produces graphs on the server, KairosDB produces
graphs on the client.

Both are integrated with Grafana time series dashboard, OpenTSDB has more side
projects, KairosDB is the only time series database I know for being
integrated with a reporting tool (BIRT).

OpenTSDB requires to create the metrics in advance using a special tool (needs
to lock the cluster to allocate a new ID), KairosDB can have any kind of new
metric on the fly.

If you need something modular for building custom features I strongly
recommend KairosDB. Look at the code, it's really nicely crafted.

Otherwise, they both have goods, I found that KairosDB is also much less
limited on the cornersides (while having less side-projects), and we now use
kairosDB intensively.

------
lasermike026
Congrats Paul!

Mike (BMark Admin)

~~~
pauldix
thanks Mike!

