
InfluxDB 1.0 GA Released: A Retrospective and What’s Next - pauldix
https://www.influxdata.com/influxdb-1-0-ga-released-a-retrospective-and-whats-next/
======
zenlot
While talking about InfluxDB people should not forget about their beta release
of clustering to attract more users and then making it just enterprise. And
no, it's not an argument that this is the only way to get paid in open source.
Users should be careful while adopting InfluxDB as Influxdata does not clearly
elaborate their plans on the product.

~~~
vegabook
This is becoming problematic across the open source DB world. Witness the
graph capabilities of Datastax Enterprise on top of Cassandra, and Riak TS
also took quite a while to open source the "TS" version, hoping I guess to
monetize without having to open source. This makes me worried about adopting
in case I later get gotcha'd. I am really looking forward to a truly open
source time series database. It's not that I don't mind upgrading to
enterprise, or paying for services, but the prices end up being so "oracle"
like. For example Datastax wants something like 10k per node per year! I mean
a credible Cassandra cluster is going to be 5 node bare minimum which really
is a lot for a bootstrapping startup.

One of the big problems in the TS world is that KX (KDB) charges an enormous
fortune and all the wannabe competitors are salivating at grabbing some of
that money.

~~~
jnordwick
Why does the open sources world struggle with timeseries / tick databases so
much? I'm a very big KDB fan, but I thought there would be some competition
from the open source people at some point, but it seems like every attempt
fails. KDB does so well because of its simplicity. Can the OS people not do
simple (this is a possible argument), or is it that, as you point out,
whenever something is about to be released into the OS sphere the lure of
money prevents a full release. Or are they too distracted with the Web and
build too many solutions tailored to it. I'm just amazed that a good TSDB
hasn't come from the OS crowd yet.

~~~
snowwindwaves
Here [0] is a good blog post and spreadsheet comparing the various open source
time series databases.

[0]: [https://blog.dataloop.io/top10-open-source-time-series-
datab...](https://blog.dataloop.io/top10-open-source-time-series-databases)

------
linsomniac
I've been using InfluxDB for almost a year now. At one point around 9 months
ago I had given up on it because it was a bit crashy, and the database just
was growing too fast. But the promise behind it was too compelling and I
started experimenting with newer versions around 6 months ago and it has been
just great! Much easier to deal with than Graphite/collectd/carbon, telegraf
has not been eating our servers like collectd was, CPU usage is way down...
Loving InfluxDB. Still need to implement annotations and SNMP polling in
telegraf, but it is awesome. We are even pushing some application stats into
it.

~~~
pauldix
We updated the SNMP plugin a few weeks ago. Was thanks to a contributor who
has been super helpful. Does the updated one do what you need?

~~~
imaginenore
Please pay the guy. He is helping your product.

~~~
rrhyne
So, super obvious, but thought I'd just mention that influxdb is open source.
It's likely that free version helped the contributor with his issue so he
contributed back.

~~~
imaginenore
You don't think the enterprise users can use his work?

~~~
grey-area
The implication of an MIT license is that _anyone_ can use or sell your work
with attribution. If the contributor was not happy with that, or unhappy with
getting a free db worked on by other people, they simply wouldn't contribute.
Since they did contribute on those terms, we have no reason to think they want
to be paid.

[https://github.com/influxdata/influxdb/blob/master/LICENSE](https://github.com/influxdata/influxdb/blob/master/LICENSE)

------
scrollaway
Influx is absolutely amazing. We're using it along with grafana to store and
display our desktop and web apps' analytics (it completely replaced GA), store
and display HTTP health analytics (piping custom uwsgi request logger into UDP
input), and do continuous analysis of Hearthstone games.

It's incredibly fast and the grafana/influx/telegraf stack is really cool to
play with. Highly recommended.

~~~
chrissnell
Want to hear more about your HTTP analytics. Particularly interested in
collecting selenium/sitespeed.io-style page timing metrics. There are a
million commercial solutions for this but so few open source options.

~~~
scrollaway
Glad you asked! I'm pretty proud of this hack :)

uWSGI has a very flexible logging system. You can create a log with just
requests, and completely customize the format of the line.

On top of that, uWSGI supports creating multiple log targets, and supports
logging directly into a UDP socket.

If you make it match InfluxDB's line format
([https://docs.influxdata.com/influxdb/v1.0/write_protocols/li...](https://docs.influxdata.com/influxdb/v1.0/write_protocols/line/)),
and you set up InfluxDB to accept connections from a matching UDP socket, then
uWSGI can essentially log every request directly into influx.

Like this for example:

    
    
        logformat = uwsgirequest,host="%(host)",status=%(status) msecs=%(msecs),size=%(size) %(epoch)000000000
        logger = file:/var/logs/uwsgi/uwsgi.log
        req-logger = file:/var/logs/uwsgi/requests.log
        req-logger = socket:127.0.0.1:8083
    

You can also add 'path="%(var.PATH_INFO)"' to that but there is a potential
data injection vulnerability if you don't re-parse the output.

With that, you can create nice graphs with Grafana to analyze your loading
times, error rates, page sizes etc:

[https://i.imgur.com/fzbXgUd.png](https://i.imgur.com/fzbXgUd.png)

You can also set up Kapacitor to identify when your page load times are
abnormally increasing. We had the idea after a database connection leak
increased our average loading times by 300%.

Do note that this doesn't batch inputs, so 1 request = 1 connection to influx.
But it's pretty easy to put a middleware agent in between those to do
validation and batching.

The nice thing about all this is it's application level timing metrics.
There's not a lot of ways to get that, except with application middleware
which can't always catch everything you want.

~~~
chrissnell
Sweet. Since it's UDP, it's connectionless so any penalty for not batching
data points is likely minor. We're not a Python/uWSGI shop, but it looks like
I might be able to emulate the Influx format as you have done using this nginx
logging module:

[https://github.com/vkholodkov/nginx-udplog-
module](https://github.com/vkholodkov/nginx-udplog-module)

------
Rapzid
We are using influxDB to calculate SLO performance. Currently, among other
things, we process about 30M ELB logs entries into influxdb per day; it
handles this easily of course. Here are some musings for those interested
based on 0.9:

Wins:

* The new storage engine is very, very cool. Would love to work on this thing. It's fast and space efficient.

* Built in support for time bucketing GROUP's

* Grafana integration is pretty good

* Writes come back after the data is stored; makes it easy to create durable, idempotent processing pipelines.

Woahs

* Unable to combine measurements in the same query; needs ETL with continuous queries or external tools

* No support for subqueries; more ETL

* Stream processing is a little lacking -> can't group on values and high cardinality tags make the DB explode; high cardinality is being worked on but IDK how high it will go, plus I mean the storage engine serves up streams of time-sorted data so samza that stuff up.

* Random crashes but the DB gets along fine when it comes back up

* Compactions use LOTS of RAM. Supposedly this can be tweaked and has been improved for 1.0

* Backfill queries with lots of points seem to use a crazy amount of RAM when bucketing on narrow time windows

Overall it's chugging along quite well. Most of the query limitations we are
able to solve with a combination of continuous queries and AWS lambda
functions kicked off by CloudWatch Events.

------
gorodetsky
I still don't quite understand Chronograf: I know that you want to own the
stack but are there any major advantages over Grafana?

Sorry if I'm being ignorant but I couldn't find anything that would've made me
think one way or another.

~~~
pauldix
We're working on a re-envisioned Chronograf. The goal is to have something
that's complementary to Grafana. Most of our users love Grafana and that's
good.

The next version of Chronograf, coming later this year, will be a re-
envisioned and fully open source version. It won’t be about dashboards, it’ll
be about an out of the box user experience for monitoring containers,
Kubernetes, and Docker Swarm.

We're actually looking for early testers that want to walk through wireframes
and work with us on making a great out of the box experience for what will be
a fully open source monitoring stack.

~~~
zenlot
Until you'll decide to make some of the features just for enterprise, as you
did with InfluxDB?

~~~
dominotw
Just exploring this for devops at work. This comment makes me a little
worried. Is there a risk that some features will go 'enterprise only' in
future?

~~~
pauldix
Our enterprise offering is for HA and scale out clusters of InfluxDB.

InfluxDB single server, Telegraf, Kapacitor single server, and soon Chronograf
are all open source.

We'll continue to heavily develop our open source projects in addition to
developing closed source software that we can license to customers. Basically,
to be able to continue open source development, we need to have paying
customers.

~~~
zenlot
I don't understand the part of paying customers. Why couldn't you further
develop clustering for open source version, as it was first promised and still
have paying customers? There are many working examples doing this. Also, this
is not a good example for OSS to do as you did with open source/enterprise and
clustering bit. At least some explanation or vision could have been provided
afterwards.

------
LogicX
We've been using influx since 0.9 in production. Had a few bumps with
cardinality growing out of control, but now working around those limits, it's
going well. Looking forward to that being something tackled with upcoming
releases.

~~~
pauldix
Yep, we're very focused on solving the cardinality problem now. See these for
some details about how we're thinking about it:

[https://github.com/influxdata/influxdb/pull/7175](https://github.com/influxdata/influxdb/pull/7175)
[https://github.com/influxdata/influxdb/pull/7174](https://github.com/influxdata/influxdb/pull/7174)
[https://github.com/influxdata/influxdb/pull/7186](https://github.com/influxdata/influxdb/pull/7186)
[https://github.com/influxdata/influxdb/pull/7264](https://github.com/influxdata/influxdb/pull/7264)

------
otoolep
Great stuff. Congrats to everyone on the InfluxDB team on this big milestone.

~~~
pauldix
Thanks, Philip! You helped us get here :)

------
hamax
A very happy InfluxDB user here, but did you really had to title your release
email "InfluxDB 1.0 GA is Here! Plus 27x Faster than MongoDB Benchmarks"?

------
fabian2k
I played around with InfluxDB, Telegraf and Grafana a while ago, and it worked
very nicely for the basic stuff I tried.

One thing in Telegraf where I didn't figure out a good solution was a way to
parse arbitrary log files and generate data points and/or annotations from
them.

There is a particularly annoying log file format from a proprietary
application containing data I like to monitor which contains time series
values in a multiline format as well as error messages. What I'd like to do is
to have Telegraf tail the log file and pass it through a script that generates
actual influxdb data from that. So something similar like the telegraf tail
plugin, but with a data transformation in between.

~~~
pauldix
We added log parsing to Telegraf a month or so ago. See here for more details:

[https://github.com/influxdata/telegraf/tree/master/plugins/i...](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/logparser)

~~~
joshpadnick
I'm wondering if Telegraf is a legit replacement for logstash or fluentd for
shipping logs. I couldn't find this info from docs, so do you know if:

\- it supports multi-line logs (e.g. java stack traces)

\- it can output to elasticsearch (didn't see an output plugin)

\- there's any solution for reading docker logs (looks like docker metrics are
supported)

\- any other critical logstash functionality missing?

If it doesn't support the classic Elasticsearch output, where are telegraf log
users typically outputting logs to?

~~~
sparrc
> \- it supports multi-line logs (e.g. java stack traces)

unfortunately it doesn't, there hasn't been a request for it yet but please
feel free to open an issue on the repo with any details you can bring:
[https://github.com/influxdata/telegraf/issues](https://github.com/influxdata/telegraf/issues)

> \- it can output to elasticsearch (didn't see an output plugin)

Nope, not yet:
[https://github.com/influxdata/telegraf/issues/782](https://github.com/influxdata/telegraf/issues/782)

> \- there's any solution for reading docker logs (looks like docker metrics
> are supported)

If there is a logstash "grok" pattern for parsing docker logs, then telegraf
supports it. Though it's probably worth formulating a grok pattern
specifically for telegraf that properly takes advantage of tags, fields,
measurements, etc.

> \- any other critical logstash functionality missing?

Need more user feedback to tackle this one, but feel free to open an issue
with anything that you find is missing :)
[https://github.com/influxdata/telegraf/issues](https://github.com/influxdata/telegraf/issues)

~~~
joshpadnick
Thanks for this great response. To give more color on my use case, my goal is
to ship logs off the server and to some central log aggregation platform. That
could be self-managed (e.g. Elasticsearch) or hosted (e.g. SumoLogic, Loggly).

In many cases, I'd want to get those logs from Docker containers and include
metadata on each container so I know, e.g. what app is running, the container
Id, on what host, etc.

Traditionally, tools like logstash, fluentd, and heka meet these needs.

It doesn't sounds like Telegraf is quite ready to support this use case in
full, but could certainly head in that direction.

------
aleksi
> We had been grappling with what we should do for most of August 2013 and had
> another idea that I planned to debut at Monitorama in Berlin in late
> September. The conference was all about monitoring and I thought it would be
> a good place to find a receptive audience for a new monitoring product.

What was it?

~~~
pauldix
It was going to be called Anomalous. The idea behind it was that it would be
an agent that gets deployed everywhere in your infrastructure that has
collection, time series storage (either in memory, on-disk or both), alerting,
and a basic web UI.

The agents would all call to a central service, which would have bidirectional
communication. The central service would then act as a distributed
alerting/query aggregator. So you could hit that and query a specific agent,
aggregate across multiple ones, etc.

Kind of like a fully distributed time series, monitoring, and anomaly
detection platform.

I still like the idea, but we didn't have the runway or resources to go down
that path. Open source got us much more traction, combined with the fact that
I personally like to work on OSS projects :)

------
nwmcsween
As per the article the issue influx tries to solve does so by trying to jam
non-ts data into a ts database. You gain much more by simplifying things with
say a constant increment mirror mmap file with another simple KV that maps
offsets for tags and such. I would bet this would be immensely faster except
for contrived update-tag-on-every-entry.

------
tbarbugli
We evaluated InfluxDB 6 months ago and did not move forward because cluster
mode was still in beta. Is there anyone running a cluster in production with
some decent traffic?

~~~
iddqd
They yanked the clustering from the beta and put it into their Enterprise
offering, so I guess you'll have to contact a sales rep for that kind of
information.

------
joekrie
Cool to see a database specializing in time series data.

