

A viable replacement for rrd for storing timeseries data  - anuj

rrd is an awesome tool but it causes data loss due to averaging . I am looking out for something that is almost as efficient as RRD and causes no dataloss with time . I am fine with disk space usage
======
dhm116
We've played around with:

OpenTSDB: <http://opentsdb.net/>

StatsD: <https://github.com/etsy/statsd/> (description here -
[http://codeascraft.etsy.com/2011/02/15/measure-anything-
meas...](http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-
everything/))

OpenTSDB and StatsD seemed great for getting TS data in and producing nice
dashboards, but they didn't seem to fit our needs for performing custom
analytics on the data.

At the moment, we're leaning towards leveraging Cassandra based on our
scalability requirements. Check out
[http://rubyscale.com/blog/2011/03/06/basic-time-series-
with-...](http://rubyscale.com/blog/2011/03/06/basic-time-series-with-
cassandra/) and [http://www.datastax.com/dev/blog/advanced-time-series-
with-c...](http://www.datastax.com/dev/blog/advanced-time-series-with-
cassandra) to get an idea on how cassandra can help.

~~~
anuj
how you played around with mongodb

~~~
dhm116
Actually, we started out using mongo to focus on the analysis instead of the
schema, but we quickly ran into performance issues as the datasets grew. We
were simply using mongoengine for Python, so we didn't spend a significant
amount of time trying to optimize our schema or implement things like
sharding.

Our performance issues with mongo largely stemmed from our poor use of indexes
- we defined a lot of indexes because _how_ we needed to query was a very
organic and undefined process as we got new analysis requirements. Because we
would have to frequently go back and compute new feature vectors across the
whole (or large parts of) the dataset, we weren't able to implement a lot of
the aggregation capabilities you'll see implemented in many other time series
schemas.

------
anuj
i have seen this ppt and it looks inspiring
[http://www.slideshare.net/sky_jackson/time-series-data-
stora...](http://www.slideshare.net/sky_jackson/time-series-data-storage-in-
mongodb)

