Hacker News new | past | comments | ask | show | jobs | submit login
InfluxDB 1.0 GA Released: A Retrospective and What’s Next (influxdata.com)
136 points by pauldix on Sept 8, 2016 | hide | past | web | favorite | 79 comments

While talking about InfluxDB people should not forget about their beta release of clustering to attract more users and then making it just enterprise. And no, it's not an argument that this is the only way to get paid in open source. Users should be careful while adopting InfluxDB as Influxdata does not clearly elaborate their plans on the product.

Yeah, that's kinda a dick move. I was waiting for 1.0 before giving it another real go (after their decision to rewrite everything for 0.9). Looks like I'll be checking out something else now, maybe Prometheus[1] or DalmatinerDB[2] will fit the bill.

1. https://prometheus.io/ 2. https://dalmatiner.io/

It's worth noting that Prometheus isn't distributed either. So OSS InfluxDB or that are the same in terms of distributed capabilities.

While Prometheus's built-in local storage is not distributed, we are in the process of implementing generic interfaces to allow Prometheus to talk to decoupled distributed, horizontally scalable, durable, remote storage (or even other non-storage systems like queues etc.). There will be several implementations sitting behind this interface in the end (see already Vulcan from DO and Frankenstein from Weaveworks, both experimental right now), depending on what you'd rather run and what fits your use case best.

Most importantly, Prometheus will never decide to exclude a feature for commercial reasons. As it's a 100% open-source and independent project not controlled by any single company, we don't have those kinds of incentives. We just want to build the best monitoring system. Note that we still say "no" to a lot of features, but that's only for technical reasons, to avoid bloat, or to keep the project maintainable.

Prometheus can be scaled with sharding and there are different solutions coming out to address this, like a proxy layer for queries.

See the PromCon 2016 videos for latest news on this.

And it's worth noting that dalmatiner magic relies on ZFS + riak. The ZFS part being problematic to run on linux.

There is currently no open source monitoring solutions that support more than a single node.

If you have serious infrastructure, you gotta move to the paid SaaS monitoring solutions which are ALL awesome.: https://www.datadoghq.com/ and https://signalfx.com/

Note: I am not affiliated with any of these services. I just have a very strong interest in nagios/icinga/riemann/graphite/grafana dying since my painful experience of setting up and maintaining them.

We're looking into Cyanite as an alternative. It speaks the various Graphite protocols and feeds data into Cassandra, which takes care of the HA concerns. I haven't built it yet but I'm fairly certain we can run it in Kubernetes to make the entire stack HA.


We dropped InfluxDB after they introduced their enterprise version. Currently, we are running several nodes of Riemann/Cyanite/Cassandra. It works very well, integration is pretty straight forward.

That stack would perform even better if you swapped ScyllaDB for Cassandra.

Do you have production experience with ScyllaDB?

I wanted to use this for a while. Can we consider it 100% compatible and can be used as a drop-in replacement?

I've not yet used in in production, but I have kicked the tires enough to know that a) it's (much) faster, and b) I'm planning a production deployment this fall.

For the most part it's compatible with Cassandra out-of-the-box, i.e. many well-known frameworks that run on top of Cassandra "just work". The main thing it's missing is support for Lightweight Transactions (scheduled for the 2.0 release).

The dev team behind ScyllaDB is rock solid, very competent.

You also have http://gnocchi.xyz

This is becoming problematic across the open source DB world. Witness the graph capabilities of Datastax Enterprise on top of Cassandra, and Riak TS also took quite a while to open source the "TS" version, hoping I guess to monetize without having to open source. This makes me worried about adopting in case I later get gotcha'd. I am really looking forward to a truly open source time series database. It's not that I don't mind upgrading to enterprise, or paying for services, but the prices end up being so "oracle" like. For example Datastax wants something like 10k per node per year! I mean a credible Cassandra cluster is going to be 5 node bare minimum which really is a lot for a bootstrapping startup.

One of the big problems in the TS world is that KX (KDB) charges an enormous fortune and all the wannabe competitors are salivating at grabbing some of that money.

Why does the open sources world struggle with timeseries / tick databases so much? I'm a very big KDB fan, but I thought there would be some competition from the open source people at some point, but it seems like every attempt fails. KDB does so well because of its simplicity. Can the OS people not do simple (this is a possible argument), or is it that, as you point out, whenever something is about to be released into the OS sphere the lure of money prevents a full release. Or are they too distracted with the Web and build too many solutions tailored to it. I'm just amazed that a good TSDB hasn't come from the OS crowd yet.

Here [0] is a good blog post and spreadsheet comparing the various open source time series databases.

[0]: https://blog.dataloop.io/top10-open-source-time-series-datab...

KX seems to have a story that is very convincing specifically for financial markets applications.

All the other OSS TSDBs seem to have very good stories for storing server statistics and web clicks, or IoT data; but there are few stories, case studies, best practices for using these in financial applications?

It seems to me that specialised tools will usually be released as a product, because they're only for a specialised niche.

Relational databases are treated like they can store any kind of data, and for small data sets, it really doesn't matter if your data-model does not fit the relational type.

Once you have a larger data set, you're no longer a general purpose user.

Mapd looks really cool but as you say...disincentivized to open source.

What about Druid? It is largely supported by Metamarkets and Yahoo, companies which are not trying to monetise the db itself.

What is KX KDB?

A 'high performance historical time-series columnar database, an in-memory compute engine, and a real-time streaming processor all unified with expressive query and programming language options'

Extensively used in capital markets.


Interesting to notice that you're the only to comment that doesn't have an answer from pauldix

I've been using InfluxDB for almost a year now. At one point around 9 months ago I had given up on it because it was a bit crashy, and the database just was growing too fast. But the promise behind it was too compelling and I started experimenting with newer versions around 6 months ago and it has been just great! Much easier to deal with than Graphite/collectd/carbon, telegraf has not been eating our servers like collectd was, CPU usage is way down... Loving InfluxDB. Still need to implement annotations and SNMP polling in telegraf, but it is awesome. We are even pushing some application stats into it.

We updated the SNMP plugin a few weeks ago. Was thanks to a contributor who has been super helpful. Does the updated one do what you need?

Please pay the guy. He is helping your product.

So, super obvious, but thought I'd just mention that influxdb is open source. It's likely that free version helped the contributor with his issue so he contributed back.

You don't think the enterprise users can use his work?

The implication of an MIT license is that anyone can use or sell your work with attribution. If the contributor was not happy with that, or unhappy with getting a free db worked on by other people, they simply wouldn't contribute. Since they did contribute on those terms, we have no reason to think they want to be paid.


Even if he contributed, I think it's likely that InfluxDB helped they contributor more than he helped InfluxDB.

'the guy'? 'he'?

For every woman on github there are 18.3 men. So yes, "he" way way more likely.

Influx is absolutely amazing. We're using it along with grafana to store and display our desktop and web apps' analytics (it completely replaced GA), store and display HTTP health analytics (piping custom uwsgi request logger into UDP input), and do continuous analysis of Hearthstone games.

It's incredibly fast and the grafana/influx/telegraf stack is really cool to play with. Highly recommended.

Want to hear more about your HTTP analytics. Particularly interested in collecting selenium/sitespeed.io-style page timing metrics. There are a million commercial solutions for this but so few open source options.

Glad you asked! I'm pretty proud of this hack :)

uWSGI has a very flexible logging system. You can create a log with just requests, and completely customize the format of the line.

On top of that, uWSGI supports creating multiple log targets, and supports logging directly into a UDP socket.

If you make it match InfluxDB's line format (https://docs.influxdata.com/influxdb/v1.0/write_protocols/li...), and you set up InfluxDB to accept connections from a matching UDP socket, then uWSGI can essentially log every request directly into influx.

Like this for example:

    logformat = uwsgirequest,host="%(host)",status=%(status) msecs=%(msecs),size=%(size) %(epoch)000000000
    logger = file:/var/logs/uwsgi/uwsgi.log
    req-logger = file:/var/logs/uwsgi/requests.log
    req-logger = socket:
You can also add 'path="%(var.PATH_INFO)"' to that but there is a potential data injection vulnerability if you don't re-parse the output.

With that, you can create nice graphs with Grafana to analyze your loading times, error rates, page sizes etc:


You can also set up Kapacitor to identify when your page load times are abnormally increasing. We had the idea after a database connection leak increased our average loading times by 300%.

Do note that this doesn't batch inputs, so 1 request = 1 connection to influx. But it's pretty easy to put a middleware agent in between those to do validation and batching.

The nice thing about all this is it's application level timing metrics. There's not a lot of ways to get that, except with application middleware which can't always catch everything you want.

Sweet. Since it's UDP, it's connectionless so any penalty for not batching data points is likely minor. We're not a Python/uWSGI shop, but it looks like I might be able to emulate the Influx format as you have done using this nginx logging module:


I am new to InfluxDB but I am interested in using it for some desktop app analytics. Are there any resources that you would recommend I start with? Maybe some from your own experience? How hard was for you to start using it for desktop app analytics?

Logging datapoints into influx is pretty straightforward. For example, logging every time your app is started, adding a tag for the platform, the region, logging datapoints whenever some important settings are toggled, etc.

Once you have this data, set up grafana and build some graphs to query it. Grafana has a graphical query builder, and internally it's essentially a sql-like. There's good tutorials in the influx/grafana docs for all that.

We are using influxDB to calculate SLO performance. Currently, among other things, we process about 30M ELB logs entries into influxdb per day; it handles this easily of course. Here are some musings for those interested based on 0.9:


* The new storage engine is very, very cool. Would love to work on this thing. It's fast and space efficient.

* Built in support for time bucketing GROUP's

* Grafana integration is pretty good

* Writes come back after the data is stored; makes it easy to create durable, idempotent processing pipelines.


* Unable to combine measurements in the same query; needs ETL with continuous queries or external tools

* No support for subqueries; more ETL

* Stream processing is a little lacking -> can't group on values and high cardinality tags make the DB explode; high cardinality is being worked on but IDK how high it will go, plus I mean the storage engine serves up streams of time-sorted data so samza that stuff up.

* Random crashes but the DB gets along fine when it comes back up

* Compactions use LOTS of RAM. Supposedly this can be tweaked and has been improved for 1.0

* Backfill queries with lots of points seem to use a crazy amount of RAM when bucketing on narrow time windows

Overall it's chugging along quite well. Most of the query limitations we are able to solve with a combination of continuous queries and AWS lambda functions kicked off by CloudWatch Events.

I still don't quite understand Chronograf: I know that you want to own the stack but are there any major advantages over Grafana?

Sorry if I'm being ignorant but I couldn't find anything that would've made me think one way or another.

We're working on a re-envisioned Chronograf. The goal is to have something that's complementary to Grafana. Most of our users love Grafana and that's good.

The next version of Chronograf, coming later this year, will be a re-envisioned and fully open source version. It won’t be about dashboards, it’ll be about an out of the box user experience for monitoring containers, Kubernetes, and Docker Swarm.

We're actually looking for early testers that want to walk through wireframes and work with us on making a great out of the box experience for what will be a fully open source monitoring stack.

Until you'll decide to make some of the features just for enterprise, as you did with InfluxDB?

Just exploring this for devops at work. This comment makes me a little worried. Is there a risk that some features will go 'enterprise only' in future?

Our enterprise offering is for HA and scale out clusters of InfluxDB.

InfluxDB single server, Telegraf, Kapacitor single server, and soon Chronograf are all open source.

We'll continue to heavily develop our open source projects in addition to developing closed source software that we can license to customers. Basically, to be able to continue open source development, we need to have paying customers.

I don't understand the part of paying customers. Why couldn't you further develop clustering for open source version, as it was first promised and still have paying customers? There are many working examples doing this. Also, this is not a good example for OSS to do as you did with open source/enterprise and clustering bit. At least some explanation or vision could have been provided afterwards.

Who knows, they've done it before so they could potentially do it again.

We've been using influx since 0.9 in production. Had a few bumps with cardinality growing out of control, but now working around those limits, it's going well. Looking forward to that being something tackled with upcoming releases.

Yep, we're very focused on solving the cardinality problem now. See these for some details about how we're thinking about it:

https://github.com/influxdata/influxdb/pull/7175 https://github.com/influxdata/influxdb/pull/7174 https://github.com/influxdata/influxdb/pull/7186 https://github.com/influxdata/influxdb/pull/7264

Great stuff. Congrats to everyone on the InfluxDB team on this big milestone.

Thanks, Philip! You helped us get here :)

A very happy InfluxDB user here, but did you really had to title your release email "InfluxDB 1.0 GA is Here! Plus 27x Faster than MongoDB Benchmarks"?

I played around with InfluxDB, Telegraf and Grafana a while ago, and it worked very nicely for the basic stuff I tried.

One thing in Telegraf where I didn't figure out a good solution was a way to parse arbitrary log files and generate data points and/or annotations from them.

There is a particularly annoying log file format from a proprietary application containing data I like to monitor which contains time series values in a multiline format as well as error messages. What I'd like to do is to have Telegraf tail the log file and pass it through a script that generates actual influxdb data from that. So something similar like the telegraf tail plugin, but with a data transformation in between.

We added log parsing to Telegraf a month or so ago. See here for more details:


I'm wondering if Telegraf is a legit replacement for logstash or fluentd for shipping logs. I couldn't find this info from docs, so do you know if:

- it supports multi-line logs (e.g. java stack traces)

- it can output to elasticsearch (didn't see an output plugin)

- there's any solution for reading docker logs (looks like docker metrics are supported)

- any other critical logstash functionality missing?

If it doesn't support the classic Elasticsearch output, where are telegraf log users typically outputting logs to?

> - it supports multi-line logs (e.g. java stack traces)

unfortunately it doesn't, there hasn't been a request for it yet but please feel free to open an issue on the repo with any details you can bring: https://github.com/influxdata/telegraf/issues

> - it can output to elasticsearch (didn't see an output plugin)

Nope, not yet: https://github.com/influxdata/telegraf/issues/782

> - there's any solution for reading docker logs (looks like docker metrics are supported)

If there is a logstash "grok" pattern for parsing docker logs, then telegraf supports it. Though it's probably worth formulating a grok pattern specifically for telegraf that properly takes advantage of tags, fields, measurements, etc.

> - any other critical logstash functionality missing?

Need more user feedback to tackle this one, but feel free to open an issue with anything that you find is missing :) https://github.com/influxdata/telegraf/issues

Thanks for this great response. To give more color on my use case, my goal is to ship logs off the server and to some central log aggregation platform. That could be self-managed (e.g. Elasticsearch) or hosted (e.g. SumoLogic, Loggly).

In many cases, I'd want to get those logs from Docker containers and include metadata on each container so I know, e.g. what app is running, the container Id, on what host, etc.

Traditionally, tools like logstash, fluentd, and heka meet these needs.

It doesn't sounds like Telegraf is quite ready to support this use case in full, but could certainly head in that direction.

Telegraf users are generally parsing the logs into structured metrics and events that go into InfluxDB.

Not sure that it supports multi-line logs or docker logs. I'll have to look into it, but both should probably be done if they're not already there.

For ES output, we're happy to take PRs for other output plugins. There are people using Telegraf that aren't using InfluxDB. That's fine and we're happy to have an open source collector that others use and contribute to.

> We had been grappling with what we should do for most of August 2013 and had another idea that I planned to debut at Monitorama in Berlin in late September. The conference was all about monitoring and I thought it would be a good place to find a receptive audience for a new monitoring product.

What was it?

It was going to be called Anomalous. The idea behind it was that it would be an agent that gets deployed everywhere in your infrastructure that has collection, time series storage (either in memory, on-disk or both), alerting, and a basic web UI.

The agents would all call to a central service, which would have bidirectional communication. The central service would then act as a distributed alerting/query aggregator. So you could hit that and query a specific agent, aggregate across multiple ones, etc.

Kind of like a fully distributed time series, monitoring, and anomaly detection platform.

I still like the idea, but we didn't have the runway or resources to go down that path. Open source got us much more traction, combined with the fact that I personally like to work on OSS projects :)

As per the article the issue influx tries to solve does so by trying to jam non-ts data into a ts database. You gain much more by simplifying things with say a constant increment mirror mmap file with another simple KV that maps offsets for tags and such. I would bet this would be immensely faster except for contrived update-tag-on-every-entry.

We evaluated InfluxDB 6 months ago and did not move forward because cluster mode was still in beta. Is there anyone running a cluster in production with some decent traffic?

They yanked the clustering from the beta and put it into their Enterprise offering, so I guess you'll have to contact a sales rep for that kind of information.

Cool to see a database specializing in time series data.


We're happy to take PRs. Seems everyone has opinions on packaging and you know what it's like when everyone has opinions, right? :)

> Seems everyone has opinions on packaging and you know what it's like when everyone has opinions, right? :)

Don't try to push this into opinion realm, because it's not, it's a standards realm. Your package doesn't stand lintian test, and it's not a doesn't-really-matter type of errors that there's no README or license file. You don't even put an initscript in packages in a proper way.

It wouldn't matter that much, but your build system doesn't have an easy, clean build from local source tree with a method of putting files in right places in $DESTDIR, so the software could be properly debianized (or equivalent) by somebody else.

dozzie, do you have a PR or issue open where you could help guide us to a good resolution?

No. Why would I do so if I don't use your software, given that your product feels more like a paid thing than open source? You have enough money to hire competent sysadmin to teach you how to package software.

Well, if that's the only complaint you have, then I think Influx have done a stellar job :-)

It's the only complaint because I didn't bother looking further. It's not a sign of a good engineering when the build system and packaging are mess.

At least the RPM one seems okay. You should elaborate.

Or maybe you were sad because you only saw package downloads from S3 and you were looking for an actual Debian repository? Well, that actually already exists. See: https://docs.influxdata.com/influxdb/v1.0/introduction/insta...

I'm glad you answered your own question, but it was you who asked that. I couldn't care less if they provided APT repository; using random repositories from various companies is asking for trouble with managing your servers (hint: package retention policies).

I answered my own question because your original comment left no clue as to why you're so upset. Neither does this one that I'm now responding to.

Agreed, them providing .debs is far from "packaging being a mess". I'm quite happy that they provided .debs, worked with them to make some changes to the debs via pull-requests, and after that have been super happy with the packaging. I grab and import their packages into my own private repo for historic reasons, but it does mean I have to chase upstream changes. Personally, I think this is a straw-man argument. With many places not even providing packages, I'm glad that the Influx folks went through that process for me.

> Agreed, them providing .debs is far from "packaging being a mess". I'm quite happy that they provided .debs, worked with them to make some changes to the debs via pull-requests, and after that have been super happy with the packaging.

Clearly you haven't checked how they build their DEBs and RPMs. Some opaque, overcomplicated script that eventually calls fpm instead of proper debianization or spec file for RPM. This results, among the others, in some configs not being marked as configs.

> I grab and import their packages into my own private repo for historic reasons, but it does mean I have to chase upstream changes. Personally, I think this is a straw-man argument.

It's not. You can't rebuild a server if you suddenly don't have access to packages this server has installed, especially if you need them to be in specific versions. BTDT, several times.

You can't rebuild a server from backups? No snark, just wondering what I'm missing.

First, you need to predict that you'll need to backup software. Typically people back up their data, as software can be reinstalled (until it can't, because package retention policies). Then you need to ensure you either have enough backup space or don't store 30 copies of the same thing, one for each server (it quite often happens that OS and software weigh much more than data they host).

Second, restoring from backup limits how you can rebuild a server to just one rigid way. You can't bring another already running server to what you have elsewhere.

Sorry, I assumed you were talking about rebuilding a local mirror repository that you use to provide software for your other hosts. That would just be one backup of the packages and meta data that you can restore. Is there a reason you can't mirror the packages and repositories you use?

Just mirroring doesn't change the retention policy (unless the term has changed its meaning in recent fifteen years), so it won't do. But this is moves the discussion to spoken language semantics, which I don't feel like pushing too far.

My point with the packages is that you need your own copy that you have control over, so they don't disappear unexpectedly. Pulling already-built packages from some other repository would be fine from this standpoint, though I prefer rebuilding them myself and keeping along with source packages.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact