Most importantly, Prometheus will never decide to exclude a feature for commercial reasons. As it's a 100% open-source and independent project not controlled by any single company, we don't have those kinds of incentives. We just want to build the best monitoring system. Note that we still say "no" to a lot of features, but that's only for technical reasons, to avoid bloat, or to keep the project maintainable.
See the PromCon 2016 videos for latest news on this.
There is currently no open source monitoring solutions that support more than a single node.
If you have serious infrastructure, you gotta move to the paid SaaS monitoring solutions which are ALL awesome.: https://www.datadoghq.com/ and https://signalfx.com/
Note: I am not affiliated with any of these services. I just have a very strong interest in nagios/icinga/riemann/graphite/grafana dying since my painful experience of setting up and maintaining them.
I wanted to use this for a while. Can we consider it 100% compatible and can be used as a drop-in replacement?
For the most part it's compatible with Cassandra out-of-the-box, i.e. many well-known frameworks that run on top of Cassandra "just work". The main thing it's missing is support for Lightweight Transactions (scheduled for the 2.0 release).
The dev team behind ScyllaDB is rock solid, very competent.
One of the big problems in the TS world is that KX (KDB) charges an enormous fortune and all the wannabe competitors are salivating at grabbing some of that money.
All the other OSS TSDBs seem to have very good stories for storing server statistics and web clicks, or IoT data; but there are few stories, case studies, best practices for using these in financial applications?
Relational databases are treated like they can store any kind of data, and for small data sets, it really doesn't matter if your data-model does not fit the relational type.
Once you have a larger data set, you're no longer a general purpose user.
Extensively used in capital markets.
It's incredibly fast and the grafana/influx/telegraf stack is really cool to play with. Highly recommended.
uWSGI has a very flexible logging system. You can create a log with just requests, and completely customize the format of the line.
On top of that, uWSGI supports creating multiple log targets, and supports logging directly into a UDP socket.
If you make it match InfluxDB's line format (https://docs.influxdata.com/influxdb/v1.0/write_protocols/li...), and you set up InfluxDB to accept connections from a matching UDP socket, then uWSGI can essentially log every request directly into influx.
Like this for example:
logformat = uwsgirequest,host="%(host)",status=%(status) msecs=%(msecs),size=%(size) %(epoch)000000000
logger = file:/var/logs/uwsgi/uwsgi.log
req-logger = file:/var/logs/uwsgi/requests.log
req-logger = socket:127.0.0.1:8083
With that, you can create nice graphs with Grafana to analyze your loading times, error rates, page sizes etc:
You can also set up Kapacitor to identify when your page load times are abnormally increasing. We had the idea after a database connection leak increased our average loading times by 300%.
Do note that this doesn't batch inputs, so 1 request = 1 connection to influx. But it's pretty easy to put a middleware agent in between those to do validation and batching.
The nice thing about all this is it's application level timing metrics. There's not a lot of ways to get that, except with application middleware which can't always catch everything you want.
Once you have this data, set up grafana and build some graphs to query it. Grafana has a graphical query builder, and internally it's essentially a sql-like. There's good tutorials in the influx/grafana docs for all that.
* The new storage engine is very, very cool. Would love to work on this thing. It's fast and space efficient.
* Built in support for time bucketing GROUP's
* Grafana integration is pretty good
* Writes come back after the data is stored; makes it easy to create durable, idempotent processing pipelines.
* Unable to combine measurements in the same query; needs ETL with continuous queries or external tools
* No support for subqueries; more ETL
* Stream processing is a little lacking -> can't group on values and high cardinality tags make the DB explode; high cardinality is being worked on but IDK how high it will go, plus I mean the storage engine serves up streams of time-sorted data so samza that stuff up.
* Random crashes but the DB gets along fine when it comes back up
* Compactions use LOTS of RAM. Supposedly this can be tweaked and has been improved for 1.0
* Backfill queries with lots of points seem to use a crazy amount of RAM when bucketing on narrow time windows
Overall it's chugging along quite well. Most of the query limitations we are able to solve with a combination of continuous queries and AWS lambda functions kicked off by CloudWatch Events.
Sorry if I'm being ignorant but I couldn't find anything that would've made me think one way or another.
The next version of Chronograf, coming later this year, will be a re-envisioned and fully open source version. It won’t be about dashboards, it’ll be about an out of the box user experience for monitoring containers, Kubernetes, and Docker Swarm.
We're actually looking for early testers that want to walk through wireframes and work with us on making a great out of the box experience for what will be a fully open source monitoring stack.
InfluxDB single server, Telegraf, Kapacitor single server, and soon Chronograf are all open source.
We'll continue to heavily develop our open source projects in addition to developing closed source software that we can license to customers. Basically, to be able to continue open source development, we need to have paying customers.
One thing in Telegraf where I didn't figure out a good solution was a way to parse arbitrary log files and generate data points and/or annotations from them.
There is a particularly annoying log file format from a proprietary application containing data I like to monitor which contains time series values in a multiline format as well as error messages. What I'd like to do is to have Telegraf tail the log file and pass it through a script that generates actual influxdb data from that. So something similar like the telegraf tail plugin, but with a data transformation in between.
- it supports multi-line logs (e.g. java stack traces)
- it can output to elasticsearch (didn't see an output plugin)
- there's any solution for reading docker logs (looks like docker metrics are supported)
- any other critical logstash functionality missing?
If it doesn't support the classic Elasticsearch output, where are telegraf log users typically outputting logs to?
unfortunately it doesn't, there hasn't been a request for it yet but please feel free to open an issue on the repo with any details you can bring: https://github.com/influxdata/telegraf/issues
> - it can output to elasticsearch (didn't see an output plugin)
Nope, not yet: https://github.com/influxdata/telegraf/issues/782
> - there's any solution for reading docker logs (looks like docker metrics are supported)
If there is a logstash "grok" pattern for parsing docker logs, then telegraf supports it. Though it's probably worth formulating a grok pattern specifically for telegraf that properly takes advantage of tags, fields, measurements, etc.
> - any other critical logstash functionality missing?
Need more user feedback to tackle this one, but feel free to open an issue with anything that you find is missing :) https://github.com/influxdata/telegraf/issues
In many cases, I'd want to get those logs from Docker containers and include metadata on each container so I know, e.g. what app is running, the container Id, on what host, etc.
Traditionally, tools like logstash, fluentd, and heka meet these needs.
It doesn't sounds like Telegraf is quite ready to support this use case in full, but could certainly head in that direction.
Not sure that it supports multi-line logs or docker logs. I'll have to look into it, but both should probably be done if they're not already there.
For ES output, we're happy to take PRs for other output plugins. There are people using Telegraf that aren't using InfluxDB. That's fine and we're happy to have an open source collector that others use and contribute to.
What was it?
The agents would all call to a central service, which would have bidirectional communication. The central service would then act as a distributed alerting/query aggregator. So you could hit that and query a specific agent, aggregate across multiple ones, etc.
Kind of like a fully distributed time series, monitoring, and anomaly detection platform.
I still like the idea, but we didn't have the runway or resources to go down that path. Open source got us much more traction, combined with the fact that I personally like to work on OSS projects :)
Don't try to push this into opinion realm, because it's not, it's a standards
realm. Your package doesn't stand lintian test, and it's not
a doesn't-really-matter type of errors that there's no README or license file.
You don't even put an initscript in packages in a proper way.
It wouldn't matter that much, but your build system doesn't have an easy,
clean build from local source tree with a method of putting files in right
places in $DESTDIR, so the software could be properly debianized (or
equivalent) by somebody else.
Clearly you haven't checked how they build their DEBs and RPMs. Some opaque,
overcomplicated script that eventually calls fpm instead of proper
debianization or spec file for RPM. This results, among the others, in some
configs not being marked as configs.
> I grab and import their packages into my own private repo for historic reasons, but it does mean I have to chase upstream changes. Personally, I think this is a straw-man argument.
It's not. You can't rebuild a server if you suddenly don't have access to
packages this server has installed, especially if you need them to be in
specific versions. BTDT, several times.
Second, restoring from backup limits how you can rebuild a server to just one
rigid way. You can't bring another already running server to what you have
My point with the packages is that you need your own copy that you have
control over, so they don't disappear unexpectedly. Pulling already-built
packages from some other repository would be fine from this standpoint, though
I prefer rebuilding them myself and keeping along with source packages.