Hacker News new | past | comments | ask | show | jobs | submit login
Thoughts on Time-series Databases (jmoiron.net)
226 points by akerl_ on June 30, 2015 | hide | past | favorite | 127 comments

I actually just migrated 20 million rows of Magic: the Gathering price data from influxDB to postgres this week. For a few days of effort, I decreased my query latency by an order of a magnitude; a full set query, roughly 270 cards, went from 30 to 3 seconds with a cold cache.

The migration was prompted by influxDB 0.8 eating 50% of the VPS' cpu and 77% of the ram while idling. It had no capability to index along anything but time so every query, for my use case, required a full table scan. 0.9 was supposed to fix every issue I had with it but it was due to be 'production ready' months ago.

Unless you're dealing with ingesting an absolutely insane amount of data indexed along time, I'd have to say that postgres or comparable sql database should be more comfortable, more stable, and much more mature.

EDIT: I don't want to come off as shitting all over influxDB, to its credit it barely moved beyond idle resource usage when I was stuffing it full of data.

I've found PostgreSQL to be extremely fast if you store time series in arrays (http://www.postgresql.org/docs/9.4/static/arrays.html) in a round-robin fashion. You can also limit the array size, so that you have a fixed number of points per table row (thereby splitting your series across multiple rows), and if you adjust it such that it fits on one PG page it is quite performant.

I don't think you can limit them: However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.

By "limit" I mean your code would do it, not Postgres. E.g. your series is 86400 datapoints long (seconds in a day), you would store it as 100 rows of 864-element-long arrays.

Most SQL databases can scale and handle massive amounts of time series data, especially if they have columnstore features which make scans incredibly fast... while still giving all the advantages of adhoc SQL queries.

Most purpose-built time-series stuff isn't really necessary for much of what I see people trying to use it for.

I've hit the same problem and I would like to move back to a SQL data store. However none of the nice dashboards / visualizations support postgres or any SQL database (for now)...

My question (to everyone): what do you use as replacement for kibana or grafana?

If you don't need to keep data forever, but only several weeks or months, and you only need numeric time series data and not raw event logs, Prometheus (http://prometheus.io/) is your friend. Since it's optimized towards purely numeric time series (with arbitrary labeled dimensions), it currently uses an order of magnitude less disk space than InfluxDB for this use case, and I've also heard a few reports of people's CPU+IO usage dropping drastically when they switched from InfluxDB to Prometheus for their metrics.

As dashboards for Prometheus, you can currently use PromDash (http://prometheus.io/docs/visualization/promdash/), Console HTML templates (http://prometheus.io/docs/visualization/consoles/), or Grafana (http://prometheus.io/docs/visualization/grafana/).

Durable long-term storage is still outstanding. Although replication into OpenTSDB and InfluxDB is experimentally there.

I've just implemented a custom backend for graphite-api which seems to be working ok although I don't have crazy requirements. https://github.com/brutasse/graphite-api is a cleaned up fork of graphite (which is much easier to install). I'm using grafana as the front-end and my data is in a postgresql database and graphite-api is linking them together.

Hello, I find myself having the same need. Would you agree to share your implementation or point me to it? Thank you!

If I decided to move to using a SQL data store (I use graphite now), I would re-implement the graphite API as a listener process, or write a graphite backend. The biggest strength of Graphite is how simple it is to upload and query metrics, and I wouldn't want to lose that even if the backend were to change.

Shameless plug here - The author of the article does build time-series databases for a living, and more specifically the Datadog monitoring platform - which will gladly collect your millions of metrics, graph them and alert on them, along with all the events you care to keep :-)

We've been through a number of data stores ourselves, starting with Postgres back in 2010 - then on to Redis + Cassandra before we built our own. But that's a story for another post...


Did you consider a hybrid solution? You could store the most recent data in time series database for visualization purposes and dump the rest into a traditional SQL data store. Other than that, IIRC Grafana had plans for PostgresSQL but it's not there yet.

> 20 million rows

> 30 seconds

Ugh, that's terrible. grepping the file or reading and parsing a csv file is probably faster.

It was pretty awful watching the vps choke to death when I tried implementing any feature using full set prices. Pegged at 99% cpu usage with the go garbage collector frantically trying to not let the process crash... that was not an environment I wanted to take to production.

Could it be the environment didn't have enough memory to do things? I've seen frantic swapping and garbage collection before in VMs without enough RAM.

> vps

There's your problem. If you're at the "make it fast" part of "make it work, make it right, make it fast", you should almost certainly be on dedicated hardware.

That was actually the "make it work" portion: I was attempting to grab some 10 full set prices at a time for a gallery feature and managed to crash the influxdb.

For those of you reading this that are interested in getting on InfluxDB 0.9. If you intend to use authentication and have a somewhat high request rate, I advise you to wait until the fixes related to #3102 (https://github.com/influxdb/influxdb/issues/3102) are included. Without that fix your CPU is killed because InfluxDB is bcrypting on every request.

That explains why my updates on 100ms intervals seemed like hogging resources.

On the performance problem, my guess is that you wrote in a bunch of columns and had those in where clauses. I hope the documentation made it clear that you'd be range scanning over data and that you probably wouldn't get desirable performance.

In 0.8 and before, the preferred way to model your data was to create many separate series names. This method is currently giving many users great performance.

As with all databases, how you model your schema has a significant impact on performance.

That being said, it might be that Influx wasn't right for your use case and Postgres is just a better thing to go with.

Also, we're not supporting anything prior to the 0.9 series of releases. 0.8 is deprecated and we'll be pushing everyone to move over to 0.9 as we put out more point releases and fix bugs and ensure that it works for their use case over the coming months.

I evaluated InfluxDB for an advanced packet capture and processing application and it couldn't handle things very well. Namely expiry of old data, blocking too much on inserts. So I wrote my own in Python + C extensions. It turned out well. Has been going non-stop for year and a half now.

What data structures does it use?

For the main data and index just a plain sorted list of tuples. Main data file is [ (timestamp,data), ...] and index is [ (timestamp,offset_in_datafile), ... ]. There is a requirement that ntp server must be running and the machine synchronized to it.

Then all the files have a prefix that look like <startsec>_<startmicrosec>_<metric>. A binary search is perfomed first on files in order to pick the right files only. Then the index is read into memory, then binary search performed on the index to get the offset range and then the main data file is read.

When switching to new file time chunk, a cleanup action is peformed where previous old files are removed.

Just for the record, InfluxDB 0.9 seems to actually be production ready now. Though it doesn't look there's an easy way to migrate to it yet from 0.8.

"Seems to be", sure. Sinks like a tanker when you try to actually use it, though. None of the software libraries have been updated for 0.9 yet.

> Sinks like a tanker when you try to actually use it

Can you name some examples of this besides the CPU authentication bug? Honest question, I'm considering adding InfluxDB to our stack so I'm genuinely curious.

We've had issues with Influx being a massive resource hog. Not properly persisting data to disk. They openly admit that Influx isn't built to gracefully recover from crashes, and that you will lose data.

This by itself wouldn't be a problem for us. But when you're inserting data points, the entire damn thing seems to become unresponsive. Admin UI freezing, http endpoint no longer responding to queries, ...

I know this product is relatively young and your mileage may vary. They have a long way to go.

Based on the changes in 0.9.1, there's no way that 0.9 was production ready and I'd be very cautious about using it.


Authentication in 0.9 has a high CPU penalty right now: https://news.ycombinator.com/item?id=9810538.

It'd be great to see a more detailed guide to using and tuning PGSQL for use as a TS DB.

As an addendum, I'd like to be able to do time-based append-only operations in Postgres, a la Datomic.

It pains me to UPDATE and DELETE, thereby destroying useful data.

Would Timetravel module work for your use-case: http://www.postgresql.org/docs/9.1/static/contrib-spi.html#A...

Same here. InfluxDb has a pretty nice DSL. I wish they switched their backend to something usable and mature instead of re-inventing the bicycle. Operationally, it is expensive to support specialized databases like Influx, unless it is your core business, I guess...

> influxDB 0.8 eating 50% of the VPS' cpu and 77% of the ram while idling

> to [InfluxDB's] credit it barely moved beyond idle resource usage when I was stuffing it full of data.

solution: always be cramming it full of data ?

I don't see how it could idle when you're bulk writing, but when it's not serving any traffic it's taking up 50% of cpu (unless it's doing some kind of indexing or cleanup or something in the background, but you would expect that to eventually quiesce).

I think he meant that idle CPU (50%) and full throttle usage (51%) were hardly different? Who knows. :P

Curious if anyone has experienced problems like this in InfluxDB 0.9 ?? My 0.9 implementation is performing well, but has only small amounts of metrics for testing - not yet rolled out to production.

Prometheus uses a file per timeseries, with two levels of delta encoding to keep the data small. This is our second major storage iteration, and seems to be doing pretty well with a single server able to handle over 2M timeseries.

See http://prometheus.io/docs/introduction/faq/#why-does-prometh...

Also, Prometheus solves the batching problem that the article mentions by keeping chunks of recent data for each time series in RAM and only appending those chunks to their series files (of which there can be millions) when they are complete (and in as big groups as possible). Periodically, and when shutting down Prometheus, the RAM-buffered time series heads are written out as a linear checkpoint to disk and recovered upon startup.

You might think this could be better solved by a simple write-ahead-log (WAL), but the problem is that samples can come in at completely irregular intervals, and we still need to append data from the WAL to the appropriate time series files once chunks for a specific series are full. That would create frequent holes of already persisted data in the WAL, and it would have to be regularly compacted/rewritten. Which is in principle equivalent to the regular checkpointing approach again.

EDIT: There's some more information about Prometheus's storage here: http://prometheus.io/docs/operating/storage/ - but we really should write a white paper or blog post about it soon.

I don't understand why Prometheus uses leveldb to begin with though, why do you need it vs using a const offset of an interval? From reading it seems it's used for an index, I'm guessing because intervals can be variable?

LevelDB is only used for indexes to look up series files by sets of dimensions, not for time-based lookups. We just need to find the right time series that are relevant for your query.

As a simple example, we have one LevelDB index which has single label=value pairs as the keys and as the LevelDB value, the identifiers of the time series which have those label=value dimensions. If you now query for e.g. all time series with labels foo="biz" AND bar="baz", we will do two lookups in that index: one for the key foo="biz", and one for bar="baz". We will now have two sets of time series identifiers which we intersect (AND-style matching) to arrive at the set of time series you're interested in querying. Only then do we actually start loading any actual time series data (not from LevelDB this time).


Baron Schwartz article on Time-series Database Requirements is quite a read: http://www.xaprb.com/blog/2014/06/08/time-series-database-re...

HN discussion: https://news.ycombinator.com/item?id=9166495

I'm not sure it's universally correct to dump authorizations and visibility from the up-front requirements. It might be regarded as another aspect of dimensionality, but that could to missed opportunities for optimization.

I'm always amused when I see the criterion for a "very dense" time series as data being collected more than once per second. In my business (telemetry), we often record parameters thousands of times per second, depending on what we are trying to measure.

I've got a system here capable of collecting data close to a hundred thousand times per second from tens of sources, with perceptual real time (< 10ms) processing and aggregate monitoring.

It's called multitrack audio recording, and commonly runs on your run of the mill laptop. HD video would probably be a few orders of magnitude more data than that, with even more processing.

Computers are really, really good at these kinds of things if only the software is efficient enough.

Your last sentence is well taken, but the rest leaves me a bit puzzled about what you meant.

Your run of the mill laptop is a wonderful machine capable of amazing things. But that laptop is not rated to withstand extreme vibration and shock and huge temperature ranges. Nor does it have the external interfaces to deal with large numbers of custom, high speed digital data buses, nor does it contain the signal conditioning hardware to deal with a huge array of transducers working with differing physical phenomena, each with its own unique power and signal processing requirements. The system I support has to deal with all of these things, plus more constraints than are relevant to the conversation here.

Much (but not all!) of this doesn't even require a general purpose computer. It can be done with state machines implemented in FPGAs.

Sometimes you need something different than a run of the mill laptop.

I apologize if it came across as a snarky comment or one-upmanship, I merely intended to support your original comment.

No problem.

Like I said, you do have a very good point. I am often frustrated by the realization that my laptop is much more powerful than this specialized data collection equipment. But we have a very large fixed base of equipment, with no plans to upgrade anytime soon.

Do you use a database for your data? We store our flight test data as individual DATAQ files with a naming scheme. This is manageable for our very simple civil aircraft so far.

Ah, a fellow flight test person. I don't handle the warehousing of data, so my knowledge is limited. I know a database is involved and used but I could not say what database or provide any specifics. Sorry.

EDIT: I would like to know how other flight test people visualize their test data. Seems like you can only make so many envelope charts and strip charts (they work and are effective, but man, sometimes I'd like something different)

All we use are strip charts. It's all I saw at Boeing as well (through some proprietary charting software IIRC). What I'd like is some "rich" time series data format that stores a history of modifications (removing noise, filtering etc.), and an overall better interface for scrolling through the data (Windaq is pretty barebones).

Wow, that is pretty rough. That's so rough that I would consider figuring out some way to export your data from that software and then visualize it with python/matplotlib or similar.

I've been working on a Python module to read the files directly: https://github.com/choj/cldumpread/tree/master/WindaqReader/.... Scipy/numpy/matplotlib totally opens up what we can do with the data. The hard part is convincing a mechanical engineering manager who's extremely averse to new tools.

I've been there. Once I had to jump through huge hoops just to be allowed to install Python on a machine not connected to any networks.

I remember talking to someone that had an in-house flight test data warehouse with these kind of features, I think it might have been Cessna - give their FT dept a ring :)

I use kst for plotting, transforming, visualizing, zooming, scrolling etc. It handles real time data nicely but I'm only doing 10 hz from a Plc using modbus protocol.

The collection time is only a single dimension, to see the scope you also need to know how many different pieces of data are being collected and for how many discrete groupings. Compare 100 inputs for a single device 1000 times a second to 100 inputs times 1000 devices once a second.

Fair enough. In the telemetry world the second case you mention is rarely seen. I'm sure someone, somewhere is doing something like that, so I won't say never. In flight testing you have anywhere from a few measurements to thousands of measurements gathered from physical transducers, and then digital data bus information that can vary dramatically from platform to platform, from just a few kilobytes/sec to gigabytes/sec.

There's many, many cases where you have lots of discrete inputs but a lower time threshold. In/out bytes per port on switches in a datacenter. CPU/memory/disk stats on virts you spin up for on-demand load, or just your normal datacenter deployment. Employees/vehicle locations in the field. Often you don't even need per-second time granularity, but that can be made up for with a larger number of discrete inputs (how many trucks does UPS have in service?)

Indeed, those cases are normal in other industries, but not in mine.

Not that funny anymore if you try to record 1MM time series at once. :)

I'm not anywhere in that realm because I don't have a data system that could support such a large amount of data. But then, I also have a physical system that does not need observations of 1 million distinct things (at least, not yet).

Can you be more specific? Is it car engine telemetry?

Aircraft telemetry.

You really need to know the position of an aircraft to a thousandth of a second?

I said nothing about position. But a lot of things measured have to do with how the structure of the aircraft responds to turbulence, rough air, and aeroacoustic vibration (aka flutter). So there might be modes where structural components have harmonics that are pretty high (several hundred to over 1000 Hz). Therefore you must use a transducer that has a frequency response that can cover that range, and sample the output of the transducer at least twice that rate (at an absolute theoretical minimum, but rule of thumb is 5x oversampling).

The highest sample rates I remember seeing was 20kHz, for the pressure sensors used in turbofan inlet distortion testing.

That isn't out of line with the ways I have seen that sort of thing measured.

Possibly he might measure some other thing, but still after your question I started to wonder how much a passenger plane moves per thousandth of a second.

So: some quick googling suggests that "economical cruising speed" of an Airbus A320 is 840km/h [1]. After quick back-of-envelope calculations, this gives ~230m/s, so 0.2m per 0.001s. Given some possible uncertainty of a single measurement, I'd imagine that's not unreasonable level of precision when e.g. your Airbus is landing on an airport.

[1]: http://www.airliners.net/aircraft-data/stats.main?id=23

I guess I should mention that GPS position is a perfectly normal thing to include in a telemetry stream.

+ ins position of course; gps position is not reliable enough (less gbas) to land a big plane with.

Really? The big boy in the field KDB+ isn't mentioned? Kx's database is pretty much the gold standard for performance in time series, historical and real-time.


I use kdb/Q at work and it’s a fun tool to play with so long as someone else is paying for it. It is quite common in finance (and comparably uncommon outside of it). It’s very expensive of course, and the learning curve is hard. In fact, there are plenty of businesses that have sprung up around kdb that offer consultancy services to help you get started. In an unusual maneuver, one of these consulting businesses actually ended up buying majority of Kx Systems, the developer kdb/Q. Anyway, if you know your Q and C++, you will always have a job in finance.

Part of the reason why it’s hard to learn (unless your job depends on it so you are forced to persist) is that the syntax is very terse. Check out this Java API for example: http://kx.com/q/c/kx/c.java. Yes, that’s the actual code you copy-paste into your Eclipse to get started.

32-bit version is free to play with nowadays: http://kx.com/software-download.php

what are unique selling points of kdb compared to other databases aside from the query language?

> Aggregation and expiry start to look a lot like dimensionality: they can be implemented asynchronously in a separate policy layer.It doesn't seem important that the actual storage engine is mindful of these; in fact, it's probably better ignored for efficiency's sake.

I wrote my own time series database and actually having well working expiry was harder than expected. I sharded each one of the files into 2GB or 24h (whatever came first) chunks. Then had to be careful how they got deleted. There were various rules there such as ("make sure to not fill partition more than 80%", "stop when partition is full" or "keep data no more than 90 days").

My number of different time series was actually pretty low but each value was a record with complex data inside it i.e. not just simple ints or floats. For example, one series was capturing network packets so it looked like [ (timestamp,packet), (timestamp, packet), ... ] . But then an indexing service was running separately in a separate OS process to generate additional complex indexing from the primary data.

What industry or app was this for? I wrote something that sounds like this for VoIP packet capture. Though the rates were around 5TB/day. I'd keep the latest indexes in RAM before delta-encoding and writing out. Periodically, merge multiple indexes into one.

Worked surprisingly well, and the indexing overhead was only a few bits per packet (given enough similarity in packets over a short chunk of time (say, 1 minute), then just indexing unique values per chunk -- it was all fairly efficient.)

VoIP capture was one of the functions in the system too. But it had others. It looks like your implementation is more sophisticated, I didn't do any delta-encoding. Just periodic fsyncs. The maximum we tested was up to 1TB/day.

Main index ( [(timestamp,offset_in_data)] was simply synced every second. It just looked like a [uint64|uint32|...]. Then there is an advanced service with different plugins, runs in a separate process, trails the main data stream. Each plugin can then write out its own custom index data to file and is also saving its internal state (so it can recover during a hard crash).

Eh well mine was mostly a hack to index SIP packets. Other existing products just shoved every packet into a MySQL table. One column per field, with indexes. You can imagine that does not scale very well. So I ended up doing similar to what you describe.

Fun how designs converge. You'd think there'd be some generic libraries for this kind of stuff. Now there's LevelDB and its SSTable is sort of close but uses Snappy so it can't operate on compressed data.

What's the common library that implements fast, compressed, immutable key values without lots of per record overhead?

https://github.com/rackerlabs/blueflood is a time series datastore with aggregation and rest APIs built on top of Cassandra. It's production ready and in fact used in production at Rackspace. Full disclaimer: I am a core contributor.

Well, all you need to do is to look at KDB+ from kx.

All the other products that you mention are for children ;)

I know the Kx people pretty well, and they are trying to get the word out (have been for years), but it never ceases to amaze how little respect they get in the free software world. They are the leading timeseries database, and yet they don't even get a footnote in the article :(

I have actually heard of Kx and Kdb+ and Q and have looked into those products. You're right that I should have mentioned them, because they are important in that space, and in many ways they are miles ahead. The state of time series query languages is quite poor (outside of Influx's effort), so there is a lot to learn from Kdb+ as well.

I was however focusing on recent Open Source efforts and on the general approaches. Hence, I didn't really discuss in detail my own tsdb.

I also find virtually everything to do with K and Kdb to be be simultaneously impressive and utterly unfathomable:


This is a cheap jab to make, but it makes these systems pretty impenetrable from a source level.

Possibly because they cost an arm and a leg (or at least that's the perception) and are therefore out of reach of most firms, apart from large utilities and hedge funds, and the language looks like line noise.

Yes, I know there is a free version, but limited to 32-bit only (and probably non-commercial?).

EDIT: 32-bit version can be used commercially.

Agree about the cost. However, I would think that a wider adoption would eventually bring the cost down and perhaps even spawn a bunch of related open-source projects.

As far as readability is concerned, q(KDB+) is far more readable than k(KDB). Also, nobody stops you from adopting a coding style that is more readable. That is what I personally do.

OP works on the time-series processing backend at Datadog http://datadog.com

I've been pretty happy with OpenTSDB (a TSD built on top of HBase). It happily ingests tens of thousands of points per second, supports full-resolution historical data, and has reasonably fast queries.

The main downside is HBase is operationally complex, but if you've already made the investment there (as we had), it's a great option.

If you are looking for time-series databases based on Cassandra that you can use with Graphite, check out

Cyanite: https://github.com/pyr/cyanite and https://github.com/brutasse/graphite-cyanite

KairosDB: https://github.com/kairosdb/kairosdb and https://github.com/kairosdb/kairos-carbon and https://github.com/Lastik/KairosdbGraphiteFinder

Have you used cyanite in any meaningful way? The original author's (pyr) repo has been pretty dead.

This person has been doing a lot of good work: https://github.com/mwmanley/cyanite

I've considered it but haven't made the jump yet. Pyr gave a presentation a few weeks ago that suggested his company is already using it and further development is coming.


Why does every TSD seem so overly engineered and for all the wrong reasons? Why not just use a time decaying ring buffer (multiple buffers could be used), one statistic one file (or more depending on the decay) and offset by a set interval if you have irregular intervals 'smooth' it to fit O(1) for most things. My other issue is (from a glance) looking at some TSD they ignore most research done on how to effectively store the data.

Too many writes, use a cache to batch. No metadata? put the metadata in a different file and point to the offsets.

Are you one of the guys that implemented a Twitter clone during the weekend?

    Why not just use a time decaying ring buffer 
Because you don't want to through the data away. RRDTool and Whisper implement ring buffers but you lose data and resolution with them. If that's acceptable then absolutely use those tools. If you don't want to lose data though then you need something else.

You don't have to decay, I just assumed eventually you would want to decay the data but in a separate ring buffer, thus no resolution loss.

This is how things work at the industrial plant where I work. We dump a heap of data out of PLC's at basically whatever the native frequency the instrument can log it hits the first datastore (which is essentially a flushed ring buffer) where it might get thrown at a HMI display or something like that after that it decays into the slower historical archive where it can get some metadata added to it and get rationalized aggregated, batched together or whatever.

It varies but datastores tend to be 1 hour worth of data, then 3 days, then 3 months, then permanent. As you move between datastores latency to access a timestamp increases.

We tend to call Time Series databases "Data Historians" its a big industry and from what I can tell most commercial products are built around ring buffers.

What you are describing is exactly what Carbon does (The datastore of Graphite). It's a few hundred lines of not-exactly-production-quality python.

But if you'd actually RTFA, you'd know that Whisper is hardly the end-all of TSD, that cache to batch is not so trivial as you make it seem.

I do believe there's sort of optimal solution that isn't very complex and not far away from many solutions out there. But most definitely more complex than the TSD's in existance today.

You say over-engineered, but if you'd take a look at what's out there, they're all super simple. Usually being a simple layer over some pre-existing storage engine.

I'd be surprised if in any of the currently popular TSD's there more than a month or two of fulltime work put in just the storage engine. Even though there are so many, no one has time to make them as optimized as say popular SQL databases are.

You could 'decay' the buffer into another by anything, stronger compression, time. Over-engineered in the sense that it's layers of tools that don't need to be there.

The trade off is decaying data. We store and query 300,000 counters with 10 days of 10 second resolution and successive buckets out to six years, on a single bare metal server (+ backup replica) using GitHub.com/imvu-open/istatd but this is not right for everyone.

A few comments on some things I'm seeing in these.. comments.

- What about Cassandra/PostgreSQL/Redis?

One of the implications of "99% of data is never read" is that it's incredibly wasteful to keep it all in memory. You might be assuming that you are letting the data expire eventually, but I'm actually not; expiry is a secondary concern to storage.

Once you start to involve the disk (PostgreSQL and Cassandra), you start to get into locality issues that these databases weren't really designed for.

For a more concrete description, lets say I have 2000 machines. Our app is Python, so they run 16 dockerized processes each, with each container reporting 10 simple system metrics every 10 seconds. These metrics are locally aggregated (like statsd) into separate mean, max, min, median, and 90pctile series. I've not instrumented my app yet and that's already 160k writes/sec on average; if our containers all thunder at us it's 1.6m, and frankly this is the "simple" case as:

* we've made some concessions on the resolution

* we only have dense series

Anyone who has used graphite at scale knows that this is a really painful scenario, but these numbers are not particularly big; anywhere you could take an order of magnitude there are a few other places you can add one.

I'm also assuming we are materializing each of these points into their own timeseries, but that's more or less a necessity. It gets back to the locality issues; If we wanted to see the "top 10 system load per node", it's actually quite imperative that we aren't wasting IO cycles on series that _aren't_ system load; we need all we've got to handle the reads.

(As a side point, this is why people in the cloud are adopting so much Go so quickly; it's proven to be easy to write AND read, and also to reduce orders of magnitude in several of the dimensions above, eg. "we can use 1 process per box not 16 containers, and we can get by with 1000 machines not 2000." Having to write your own linked list or whatever doesn't register in the calculus.)

- 1s resolution isn't dense:

No, not always. It's hard to please everyone with these things. In my world, 1s is good (but not great), but 10s seems to more or less be accepted universally, and much sparser (5m, 30m, 6h) is not actually uncommon. At the other end of the spectrum, you can be instrumenting at huge cardinality but very sparsely (think per-user or per-session), perhaps only a few points per timeseries, and the whole of what I've described above kinda gets flipped on its head and a new reality emerges. For what I've described, I quite like the Prometheus approach, but for my very specific use case 1-file-per-metric only beats the filesize block overhead often enough for very long timeframes; to long.

- Why are all TSDB over-engineered?

I hope some of the above has made explicit some of the difficulties in collecting and actually making this data readable. I've only actually thusfar discussed the problem of "associate this {timestamp,value} pair with the correct series in storage"; there are also the following problems:

* you can't query 3 months of 1s resolution data in a reasonable amount of time, so you need to do rollups, but the aggregations we want aren't all associative so you have to do a bunch of them or else you lose accuracy in a huge way (eg, if you do an avg rollup across min data, you flatten out your mins.. which you don't want); this means adding ANOTHER few dimensions to your storage system (time interval, aggregator)

* eventually you have to expire some of this junk, or move it to cold storage; this is a process, and processes require testing, vigilance, monitoring, development, etc.

* you need an actual query system that takes something useful and readable by a human ("AVG system.cpu.usage WHERE env=production AND role=db-master") and determines what series' actually fall into those categories for the time interval you're querying. Anything holistic system that _doesn't_ do this is an evolutionary dead end; eventually, something like Prometheus or Influx will replace them.

These are minimum requirements once you "solve" storage, which is always a very tricky thing to have claimed. If you get here, you've reached what decent SaaS did 4 years ago and what very expensive proprietary systems handled 10 years ago.

- What about Prometheus/InfluxDB/Kdb+/Et al.

Kdb+ is very expensive, its open source documentation is difficult, and its source is unintelligible. It is basically from a different planet that I'm from. Even recently, when I encounter people from, say, the C# world and tell them I work with Python and Go, they ignore Go and say "Wow, there are like no jobs for Python", which I find utterly bewildering. Of course, I never encounter any jobs using C#, either. This is how little some of these spheres overlap sometimes. Someone from the finance world is going to have to come in and reproduce the genius of K and Q for us mortals in a language we understand.

As for Prometheus and InfluxDB, I follow these more closely and have a better understanding of how they operate. I think that they are both doing really valuable work in this space.

From a storage aspect, I think the Prometheus approach is closer to the one that I need for my particular challenges than the InfluxDB one is, and in fact it looks a bit like things we've already had (but also Catena, Parquet, et al..) For most people, storage actually isn't important so long as it's fast enough.

And this is kind of the point of my article. There's starting to be a bit of a convergence among a few Open Source TSDB, and I've taken I've tried to highlight some issues in those approaches and suggest that there's room for improvement. I have my own ideas about what these improvements might look like based on my work at Datadog, and once they're proven (or even disproven) they'll be published somehow.

What are the benefits of introducing specialized time-series databases vs using Redis, Casandra, or some Sql database?

One way to think of it is why would you use Cassandra when you already have a filesystem?

Those databases are quite raw, to be useful for a use case like monitoring you need to build on top of them to allow for ability to get data from agents, do some processing and enable efficient querying of the data.

The answer to this question would be extremely long, so I suggest you first brush up on some fundamentals of database design.


then go onto getting mad at Michael Stonebraker here:


and then when you realize he's right about everything, you'll have the foundational knowledge to understand the problem a little bit better.

Very few if any if your talking Cassandra (I would never store time series in redis), I worked developing TSD and are active in the space, most companies goes with Cassandra these days and builds computational frameworks upon that.

Just curious, why not Redis?

I'm not him obviously but...

If you are at the scale you can't dump it on just 1-2 nodes and call it a day [which is when you are start looking at Cassandra or a dedicated TSD] ...you really need 3 DC availability usually and Redis simply cannot do that in any reasonable way.

Even if you don't need a dedicated TSD with redundancy - Redis would still be more expensive to run given that you you would need to keep everything in memory. Given that you won't read 90% of the data most of the time, it makes little sense to store all of it in memory

InfluxDB can downsample on the fly, to name an example. Functionality like this out of the box can be pretty handy, otherwise you have to build it yourself.

I'm surprised that PI[0] isn't mentioned in this thread. It's been around since the 80s and it's ubiquitous in the utility industry for recording SCADA data.

[0]: https://en.wikipedia.org/wiki/OSIsoft

A list of various Time-series Databases together with their estimated popularity can be found at http://db-engines.com/en/ranking/time+series+dbms

I wonder why Datomic isn't mentioned.

We store our event stream data in Elasticsearch. Two features that made it appealing:

  * the ingest-side can be scaled up by adding more shards
  * the query-side can be scaled up by adding more replicas
To compute rollup analytics, we make heavy use of Elasticsearch's aggregation framework to compute daily/weekly/monthly/quarterly active users.

From my understanding Postgres has many of these features, but the distributed features of ES are killer!

We're using ElasticSearch for events, too. The aggregation operators are really surprisingly fast.

That said, one major downside to ES is that it's not schemaless. You can try to use the dynamic mapping system, but it will most likely just bite you eventually, since ES is strict about coercing data types. If your data isn't completely consistent, it will actually refuse to index it. Any changes made to your schema also requires reindexing. (For some reason ES can't do in-place indexing, despite supporting storing all the original data in the "_source" field.)

If your data isn't perfectly consistent, one way to work around the mapping problem is to append a type name to every field. So instead of indexing {"user_id": "3"}, you index {"user_id.string": "3"}. This means that if you get some input data where the user_id is an int, it doesn't conflict because it will stored in "user_id.int". You have to handle the inconsistency on the query end, but it's possibly better than micromanaging the index.

I thought someone was going to mention http://druid.io/. I end up not using it for anything, but at some point I was investigating time series DBs and I thought it was interesting.

If I'm not wrong it combines both very fast in-memory OLAP features and what they call "Deep Storage", which I think is a way to store things on disk for slower historical analysis.

The subject of Time Series has lately been on my mind as well, see my blog posts on accuracy of Graphite vs RRD, as well as InfluxDB storage: http://grisha.org/

I am leaning towards none of the above being the best solution and am in the process of writing my own (too early to announce yet).

Like I mentioned here [0], whisper isn't the only storage option for graphite. Another user [1] mentioned blueflood. Have you evaluated any of these cassandra-based options?

[0] https://news.ycombinator.com/item?id=9808035

[1] https://news.ycombinator.com/item?id=9808662

At Foursquare, work was being done to put Hypertable under Graphite. It was incredible to use (my year+ queries returned in tens of milliseconds), but I don't know what came of it. Hypertable is criminally overlooked in the industry, and TSDB is a killer app; a little bit of glue code and you've basically invented a crude clone of Google's monitoring, with that stack.

I've looked at Cyanite, and it has the same accuracy problem as Graphite, which I describe in my blog post. Also I'm leaning towards the conclusion that something like Cassandra, which was design for Facebook scale is even a suitable store for something as compact as TS, which is just a stream of floats.

It seems that the vast majority of these open-source TSDBs are focused on fairly technical event stream data arising specifically from IT infrastructure... is anyone using them with success in more business-oriented domains, e.g. energy meter data, stock trading, other telemetry?

We were using InfluxDB for web analytics (generated by user interactions such as "viewed page", "viewed product", "refined search", etc.).

0.8 way okay, but very slow and unstable. But in 0.9 the data model is very different and no longer a good fit for this type of analytics (it looks great for devops metrics, though), so we're abandoning InfluxDB altogether.

We're now in the process of migrating to ElasticSearch, which is looking much better. ElasticSearch has its own problems, though, and we will be evaluating the same dataset in PostgreSQL soon.

Handling time-series data with Google BigQuery https://cloud.google.com/solutions/time-series/bigquery-fina...

I wonder if anyone would be interested to try a TSDB as hosted service, lets say running on top of Google Cloud Bigtable, once it moves out of beta?

What's wrong with just using plain SQL tables? How big of a dataset are we talking supporting here?

Anyone using SAP HANA for Series Data?

At Stack Exchange our monitoring system bosun (http://bosun.org) can use different time series databases as long as they can be bent into tag key+tag value models. Currently it works best with OpenTSDB, but can also support graphite (and elasticsearch populated by logstash). InfluxDB query support is in a branch, but don't want to merge until we have a devoted Bosun+InfluxDB maintainer since we don't use it at Stack currently.

Based on that experience, plus from conversations at Monitorma the other week here is what I think of the current state of some various TSDBs are. Some of this might just be lies or rumor - so take it at at that:

* OpenTSDB: Requires HBase behind it, so that can be a pain for people. Maintenance on it is sparse, it doesn't seem like the project has a shortage of contributors with the time needed. Stability isn't great (connection errors from time to time, having alerts based on querying OpenTSDB highlights this). Aggregation and downsampling don't behave as expected. For example rate derivatives happen too late in the order of operations - linear interpolation can be strange. Also to query metric with anything many tag combinations over more than a recent interval of time (say a month or more) is basically impossible - OpenTSDB memory blows up and GC dominates. This requires one to create additional metrics that are denormalized for this. This is kind of okay because OpenTSDB is incredibly storage efficient at storing time series data. No support for NaN. OpenTSDB has quite a bit of serious users https://github.com/OpenTSDB/opentsdb/wiki/Companies-using-Op.... It can ingest a lot of metrics at a high rate without issue.

* KairosDB: Not much experience here. From what I gather it is like OpenTSDB but for Cassandra. Someone mentioned that they thought they heard some core devs have gone to work at Influx which might be concerning - but I don't know if that is true. But same issue of having to run Cassandra if you don't already.

* Graphite: Very rich query language, but currently not a key / value model. Also is not very storage efficient so the approach is that data gets rolled up after a certain period of time - generally problematic for forecasting.

* InfluxDB: Looks promising, but I heard from multiple people "Tried influxdb - was cool but all my data corrupted and I couldn't recover it" at Monitorama. The general concern at Montiroma was that they are overestimating their stability currently when it comes to a production environment. Based on some basic testing at Stack, we found it to be much slower and take up a lot more space than OpenTSDB.

In summary there is no great choice today. More of a pick your pain and best fit situation. But I'm really curious what people with actual experience in these technologies can add to the tradeoffs and am hopeful for the future.

InfluxDB CEO here. Those problems with corrupting data were with the 0.8 line of releases. But to be honest there are people that have been running that and 0.7 in production for almost a year without problems. Your mileage may vary, but we're not supporting any releases prior to the 0.9 line.

For the 0.9 set of releases, this is what we're supporting going forward. There are some queries that cause the server to crash, but as far as I know, there are no problems that corrupt the database or cause data loss.

We'll be releasing 0.9.1 tomorrow. Every 3 weeks after that we'll be releasing a new point release in the 0.9 line that will be a drop in replacement.

Each one of these releases will fix bugs, improve performance, and add features on clustering (last part starting with 0.9.2).

We're starting work on the on disk size with the 0.9.2 release cycle. If it's ready it'll be in that release in 3 weeks.

Basically, it works now for some use cases and scales. Over the next 3 months we'll be adding features and optimizing to make it useful for larger scales and more use cases.

Overall it's still alpha software, which is why we haven't put anything out there that's called a 1.0 release. However, we're trying very hard to not make any breaking API changes going forward between now and whenever we get to 1.0.

I performed some OpenTSDB vs InfluxDB comparisons and found that InfluxDB used almost 20x the storage space and was 3x slower than OpenTSDB for an identical data set. The speed isn't that big of a problem and I'm convinced will get faster (esp with the write improvements in 0.9.1), but the space issue is harder to swallow.

HBase, which is what backs OpenTSDB, can compress data using LZO or snappy. Uncompressed, our test data was at 4GB, but went down to about 400MB after HBase compressed it. InfluxDB was using 8GB. OpenTSDB has done a lot of work to be byte efficient, and it's paid off. We hope InfluxDB will get to a similar place.

Yep, compression work is starting in the 0.9.2 release cycle. We'll be testing out those compression methods along with other stuff like delta encoding

Have you tried http://prometheus.io/ ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact