Hacker News new | past | comments | ask | show | jobs | submit login

> The above mechanism needs some finishing touches. The first is data expiration. If you don't need daily data for more than 30 days back, you need to delete it yourself. The same goes for expiring monthly data - in our case stuff older than 12 months. We do it in a cron job that runs once a day. We just loop over all series and trim the expired elements from the hashes.

Rather than iterating over the entire list of series and checking for expired elements you can use a sorted set and assign a time-based score. The cron job can still run once a day but you can find items in that sorted set that have members below a certain score threshold, which will almost certainly be faster.

Naturally this will increase memory usage (which may be undesired) but it's food for thought. Eventually the looping and trimming expired hashes can be coded using lua server-side scripting in redis-2.6, which is interesting in a different way and has it's own challenges.




The problem with this implementation is that you can't have multiple entries for items with the same value. For example, you might be doing metrics for each user in a web application, and want to measure access times. The obvious identifier is IP address for the key, and timestamp for the value, but when you insert a new key/value pair into a ZSET, any previous value for the same item will be replaced.

Therefore, it makes it kind of difficult to use ZSETS for metrics unless you only care about uniques.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: