
InfluxDB – Open-source distributed time-series, events, and metrics database - mnutt
http://influxdb.org/
======
siliconc0w
I was actually looking at a bunch of open source time series databases and
settled on kairosdb but this looks pretty nice.

I think there is a hackernews rule someplace that a more interesting tech
alternative shows up right when you decided to go with something else.

For reference here is the list I created when researching these:

[http://opentsdb.net/overview.html](http://opentsdb.net/overview.html) Built
on HBASE

[http://www.gocircuit.org/vena.html](http://www.gocircuit.org/vena.html) Built
on go + go'circult uses google's leveldb

[https://code.google.com/p/kairosdb/](https://code.google.com/p/kairosdb/) A
rewrite of opentsdb which can use Cassandra

[http://blueflood.io/](http://blueflood.io/) Built by Rackspace, decent but
still seems a bit immature

[http://graphite.wikidot.com/](http://graphite.wikidot.com/) Obligatory
Graphite reference (uses whisper, new backend called 'ceres' is being
developed)

[https://github.com/agoragames/kairos](https://github.com/agoragames/kairos)
(yet another 'kairos', alternative backends for graphite - SQL, redis, or
mongo)

Riak seems to be popular with SASS metric providers (hosted graphite,
boundary). There isn’t any code but there are a couple of talks that explain
how and why they went with Riak:

[http://basho.com/hosted-graphite-uses-riak-to-store-all-
cust...](http://basho.com/hosted-graphite-uses-riak-to-store-all-customer-
metrics/) [http://boundary.com/blog/tag/tech-
talks/](http://boundary.com/blog/tag/tech-talks/)

[http://vimeo.com/42902962](http://vimeo.com/42902962)

~~~
philjackson
Rolling your own:

[http://blog.apiaxle.com/post/storing-near-realtime-stats-
in-...](http://blog.apiaxle.com/post/storing-near-realtime-stats-in-redis/)

~~~
e12e
I should probably look at the code, but reading the post it isn't entirely
clear to me: are you updating all three (second, minute, hour) keys on every
insert? And are you just using the "last value" as the value for the key (so
the value for minute becomes the last value of second that falls in that
interval?

I need to read up on rrdtool as well, but I wonder if it would make much
difference (good or bad) to store the mean or other average as the "higher up
value" (ie: the average of the past 60 seconds as the minute value) ?

------
beagle3
There's a lot to be learned from the non-open source players in this space.
Specifically, kdb+ has always provided everything I needed. It's built for
HFT, and does millions of data points per second with analytics, with history
going back years.

It is, however, rather expensive.

~~~
pauldix
The stuff built for finance is definitely interesting. John and I (two of the
people working on this) previously worked at a fintech startup and worked with
a closed source time series db called OneTick. It was super fast, but its API
work for analytics/events use cases. Great for fast moving market data though.

~~~
beagle3
I was trying to recall the name of OneTick ... It does work well for market
data (especially with corrections and stuff), but is not a convenient general
database.

kdb+, on the other hand, is a time series database that works perfectly well
as a general database, with a query language that is at the same time
infinitely simpler than SQL and yet much faster, more expressive and useful.
There's a learning curve, it is steep, but it is well worth it.

------
pauldix
I'm one of the committers. The project is still early stage. At this point
we're looking for feedback on the API, which we're planning on finalizing this
month. Would love to hear about anything you'd like changed or added to the
API.

~~~
chubot
I wouldn't use this bastardized SQL dialect. SQL comes from relational
databases, which comes from relational algebra, which is an exceptionally poor
model for time series data. It's going to end up a mess and confuse people.
SQL is already a mess by itself.

I would simply use functions and operators over time series or data frame
types. Perhaps take a look at the R zoo library for examples of more advanced
things people do with time series.

~~~
pauldix
We opted for the bastardized SQL because we thought it would be easier to
understand. Of course, that may not be the case which is why we'd like to hear
what people think of it. We based our decision on people using our API in
Errplane and not understanding some parts of it without a lot of additional
explanation. One of our users said about one part of it "oh, that's just like
group by".

I'm curious, did you find the SQL dialect readable and understandable?

~~~
taf2
I really like SQL - as I was learning SQL, I thought it was terrible, but now
after years of using it really makes getting to data easy, I'm very excited to
see you choose SQL instead of JSON or something less query like to query...

~~~
beagle3
... but that's sort of a stockholm syndrome; it's because you know SQL and it
makes you feel comfortable, and you can think of worse options (like JSON).

But SQL is really a bad option in the moodern world, especially since (I
estimate) about 90% of SQL statements are built programmatically; Thus, a
query language that is easier to construct from code makes a lot more sense.
(And not, that's not JSON - some form of algebraic notation or LISPish
notation makes much more sense).

Also, SQL semantics are horrible if you have order involved (as you always do
in time series).

~~~
taf2
I disagree - it's not stockholm syndrome.. for me it was realizing that SQL
actually has solved a lot of common and useful ways of expressing the process
of getting to the data you want in the order process you want... simply
because we write code to generate SQL does not make it bad... I suspect you
also think HTML is bad? Being able to say select x,y,z from table where x=1 is
very simple and very clear IMO... I'm just giving you my opinion though... you
clearly see things differently :D

~~~
beagle3
HTML is indeed bad as a machine generated format -- which is what it is; e.g.
<p>, list items and a few other things don't have a close tag, but most things
do.

These things (like SQL) make sense if you assume that the input is (a) written
manually, and (b) by people who are not expected to do this "professionally".
Neither is the case of HTML nor SQL anymore.

(Seriously, SQL was originally marketed to managers with the idea that "it's
just plain english so you can do it yourself, and don't need programmers!".
You know how well _that_ worked out)

~~~
dragonwriter
> (Seriously, SQL was originally marketed to managers with the idea that "it's
> just plain english so you can do it yourself, and don't need programmers!".
> You know how well that worked out)

Pretty well, actually -- lots of nonprogrammer analysts use SQL for queries,
and IME the ones that do consistently are better able to answer questions
based on data than the ones that use "friendly" query tools, that inevitably
end up being much more limited in practice, and requiring a lot more support
from both programmers and DBAs to make the data that is already available
accessible through.

Unfortunately, lots of environments prevent direct SQL access to DBs for
"security" reasons (as if mutiuser DBMS's didn't have role based access
controls as a core feature)

~~~
beagle3
I still think SQL is a needlessly verbose mistake. The same people who can
successfully write SQL for queries would have just as easily (or more easily)
been able to use some algebraic notation. I am not advocating GUI query
builders - I'm advocating a non-natural-language-looking (and hopefully
better) language, along the lines of kdb+/q.

If you can properly do inner/outer/cross/asof joins to get to the data you
want, the english-like syntax is just a burden - two queries that seem similar
in their English more often than not produce completely different results
because of SQL's 3-value logic, the way NULLs are joined, and various other
things like that.

~~~
dragonwriter
> I still think SQL is a needlessly verbose mistake. The same people who can
> successfully write SQL for queries would have just as easily (or more
> easily) been able to use some algebraic notation.

I don't think that's not true -- there's lots of adults that have anxiety
around "maths-like" notations largely as a result of issues with maths
education, cultural factors, etc., despite being able to intellectually handle
the relevant manipulations -- and lots of those people end up in non-technical
business positions that end up having to deal with data. Lots of the non-IT
people I've seen using SQL definitely fall into that group, and I don't think
they'd be as proficient with a more algebraic syntax like the comprehension
syntaxes used in many modern programming languages.

Conversely, the people that can be proficient in those syntaxes almost
certainly can be proficient in SQL, though they may complain about its
verbosity.

Sure, in a perfect world where the cultural context was different, this
wouldn't be necessary. But we don't live in that world.

------
zyang
It appears to be built with Go.
[https://github.com/influxdb/influxdb](https://github.com/influxdb/influxdb)

~~~
lttlrck
[http://influxdb.org/overview/](http://influxdb.org/overview/)

It's definitely built with Go.

------
lsh123
Built one myself sometime ago:

[https://github.com/lsh123/stats-rrdb](https://github.com/lsh123/stats-rrdb)

An important part for me was the desire to completely separate data and UX (so
no graphite). Added bonus is ability to control resources (e.g. memory/disk
usage).

We run it in production for quite some time processing hundreds of data
updates per second and tens of queries per minute.

------
ankit84
WOW.. the sandbox show passwords in URL !!!

[http://sandbox.influxdb.org:9062/#/?username=ankit&password=...](http://sandbox.influxdb.org:9062/#/?username=ankit&password=ankit&database=ankit)

------
knodi
Can't wait to use it. Loving all these new DBs written in Go.

------
fit2rule
I don't seem to be able to log in to their playground - anyone else able to
register a new account? I just get "invalid username/password" no matter what
I enter.

Other than that, I look forward to evaluating this .. maybe its a solution for
a problem I have recently where I'm collecting massive log files of operation
systems, and need to navigate/parse/analyze .. so I guess I import the logs
into InfluxDB, and put a d3.js frontend on it ..

~~~
andyhmltn
I get that. But, what's more worrying is that my password came through in the
URL string (a GET request)

~~~
pauldix
We're no longer putting the password in the URL, but play and sandbox aren't
over HTTPS. And the password still gets sent. As their name implies, they're
for playing around, not for real data. On a real installation you'll want to
use SSL. We'll have that built into the prod releases or you can always have
your load balancer/proxy handle that for you.

~~~
andyhmltn
That's awesome then. Great product :)

------
dysoco
Every now and then I see a new opensource distributed and whatnot database pop
out, now, I'm totally naive in terms of databases and distributed systems. Do
we really need all this Databases? What's special about this one? Can someone
give me a summary of the main ones (Mongo, Redis, Rethink, Riak, etc.) ?

Now not discouraging InfluxDB or anything, as a systems programming fan it's
great to see more things like this coming, and as a Gopher too.

~~~
jacques_chester
> _Do we really need all this Databases? What 's special about this one?_

Good point, Comrade. I will propose to GOSPLAN that we rationalise the
development of all new technologies, to avoid such accidental evolutionary
convergence in future.

~~~
dysoco
As I said, "Now not discouraging InfluxDB or anything". I have nothing against
freedom of choice (That's what I use Linux for example) But I do agree that
fragmentation might be bad (E.g. 200 Linux distros).

------
vosper
This looks like the exact feature-set we need at my company; we're in the
middle of moving to Redshift but I'll be keeping an eye on Influx.

I know it's early days but I didn't see any information about cluster
management - how does one setup an Influx cluster, can it be resized, what
kind of hardware does it prefer?

~~~
pauldix
We're building out that portion right now. There will be a web interface for
managing the cluster. We'll benchmark it on cloud configs on different sizes
with regular spinning disks, EBS, and SSDs.

The goal with the cluster stuff is that it should be possible to add nodes to
the cluster, but the storage part of it isn't highly elastic. Meaning, you
won't be adding and removing instances from it frequently. So adding nodes
will require you to go into the admin interface, activate them, then wait up
to half a day for rebalancing to be complete (but the cluster will be
available for reads and writes during this time). However, we will be
optimizing for the case of replacing a failed or soon to be shut down node.

If you're serious about giving it a try when we have the clustered version
available, shoot me an email: paul@pauldix.net. Would definitely like to hear
more about your use case.

------
marcrosoft
Very cool, I actually made a basic version of this (only implemented
increment) with Go for the same reasons, just drop it on a server and run it.

My implementation would output a chart given parameters:

/chart.png?metric=whatever&time=12h&interval=10m

Are there any plans for easy output of graphs?

~~~
pauldix
We'd definitely like to do that soon. That's one of the really nice things
about Graphite and it makes it easy to share in emails, chat rooms, etc. For
the moment we're focused on the other parts of the API and building out the
clustering part of it.

------
krat0sprakhar
Whenever I see these new DBs I promise myself to try them in a side-project
but I almost never get around to thinking of one. Can someone be kind enough
to hit up a few ideas for side-project where this DB would shine?

~~~
lcampbell
I wouldn't say InfluxDB (or Graphite, or whatever) is something you'd develop
a side project around (though you could probably implement some novel data
visualization), but rather, they provide a backend for your side projects to
collectively aggregate metrics.

------
kylemathews
Reply the API: For javascript anyways, I think a chainable/fluent interface
with the method names modeled after Underscore.js would be grand.

~~~
pauldix
you have a link to some specific chainable/fluent code that looks like what
you're thinking of?

------
bovermyer
The sandbox appears to be down, but I'm a little concerned about the security.
Can database users be created that only have read access?

~~~
pauldix
You can limit read and write access on users. It's documented on this page:
[http://influxdb.org/docs/api/http.html](http://influxdb.org/docs/api/http.html).
We haven't implemented the specific column limits part of that API yet so
feedback would be great. Does that take care of the use case you were thinking
of?

However, security is probably not something to bother with in sandbox since
it's not HTTPS. We're looking at it now, and should have it back up in a bit.

~~~
bovermyer
Yes, that would handle it. I look forward to playing with the sandbox.

~~~
pauldix
[http://play.influxdb.org](http://play.influxdb.org) is back up

------
morgante
Have you looked into StatsD support? At the very least, a backend for StatsD
(to write into InfluxDB) would make adoption a lot easier.

~~~
pauldix
It's definitely high up on the todo list. First finalize the API and release
production worthy builds, then all those additional little add ons!

------
otterley
I like the fact that data is dimensional.

What's the scalability model? It's not clear from the documentation.

~~~
pauldix
We're working on the clustered version now. The short answer is that data
points are sharded across the cluster and replicated based on a replication
factor on a per database basis. Queries hit the # nodes / RF to answer any
given query. So writes scale horizontally and queries balance across the
cluster.

------
vinceguidry
Is there someone out there maintaining a list of the various databases and the
use cases for each?

------
genericacct
sounds interesting; i am currently using hbase for similar purposes. Do
"tables" have to be created explicitly, or can I just store a value into a
timeseries, and if the ts doesn't exist yet it will be created?

~~~
pauldix
You can just write data in on the fly. Time series get created when you write
the first point. You also can create new columns on the fly. And there's no
enforcement of a data type across all values for a given column. That's on the
user.

~~~
genericacct
Great. I am looking at integrating it into my fork of ethercalc so i can store
not just the metrics but also some of the indicators and statistics i compute
on them. For my use cases (R and spreadsheets) it would be handy if i could
get results in csv format straight from the API when i make a query.

~~~
pauldix
We'll add the CSV response to the todo list. You're not the first person to
request it and it makes total sense.

------
nornagon
This is built as a round-robin DB, yes?

~~~
Loic
Looking at the code[1], LevelDB is used for the datastore. This is using the
LevelDB Go bindings (glue in C).

[1]:
[https://github.com/influxdb/influxdb/blob/master/src/datasto...](https://github.com/influxdb/influxdb/blob/master/src/datastore/leveldb_datastore.go)

