Hacker News new | comments | show | ask | jobs | submit login

Redis works great as an LRU cache and is much more space-efficient than an in-process LinkedHashMap, especially when the keys and values are small. Plus, an LRU wreaks havoc with the the Java generational garbage collector as soon as it fills up (every entry you put in is about guaranteed to last until the oldest generation, then likely be removed).

Redis would blow latency budget though, right?

Maybe I'm missing something here, but I was under the impression Redis is one of the fastest data stores out there. What do you mean it would blow the latency budget? I'm curious because I've switched my startup's backend to node+redis.

Thanks in advance!

He means it is being compared to CPU cache (from the article), so there is tens of orders of magnitude difference. From this summary on stack overflow: http://stackoverflow.com/questions/433105/exactly-how-fast-a...

CPU registers (8-32 registers) – immediate access (0-1 clock cycles) L1 CPU caches (32 KiB to 128 KiB) – fast access (3 clock cycles) L2 CPU caches (128 KiB to 12 MiB) – slightly slower access (10 clock cycles) Main physical memory (RAM) (256 MiB to 4 GiB) – slow access (100 clock cycles) Disk (file system) (1 GiB to 1 TiB) – very slow (10,000,000 clock cycles) Remote Memory (such as other computers or the Internet) (Practically unlimited) – speed varies

But if you have a Redis cache on the same box (he says he only has one box anyway) it's still in the same category: "Main physical memory", with maybe some communication overheard.

"maybe some communication overheard" is orders of magnitude slower than L1/L2.

Hmm, there is something very wrong here. I'll try and explain in a blog post.

But we're not talking register caches. 800 megs stashed in a giant Java hash are not going to be in L1 or L2 cache.

There is a HUGE overhead when going through the network. Even if you don't (localhost), there's overhead when using TCP/IP. Even if you don't, there's a overhead when using UNIX sockets or whatever you use for Inter-process-communication.

It probably doesn't matter though..

And yes, Redis is very fast and you gain a lot when using it compared to just a Hash in the same process.

So are you saying that because of the overhead involved with "talking" to redis, the fastest datastore would actually be an implementation of my own version (or a readily available version like node-lru-cache) of an LRU cache in my node app script - the datastore would essentially be a simple JSON object embedded directly in the script with methods specific to LRU data sets, gets, and backup?

well, the point is only that In-process is the fastest there is. By using something like Redis you give up a some speed but gain features: ques, pub/sub etc etc.. But that doesn't mean you have to implement those features yourself, they might be available as a library for your language of choice as well.

The bigger win, IMHO, is that you gain flexibility: Since Redis (or whatever) is decoupled from you process it can run on another processor, another machine or perhaps run on many machines etc..

Not sure how in-process cache would work in node, being async and all, but yes in-process is faster. But then you have to think about stuff like:

  - how do you avoid loosing everything when node crashes / restarts?
  - what if another process needs to read write to the cache?
  - what if you need more memory than a single machine provides (probably not going to happen). 
  - implementation bugs
In-process: Faster but probably harder to scale. But then again, it might be so fast you don't have to.

Would Redis evicting something from the LRU make some older mails unreadable?

Don't know.. But you make it sound like a bad thing. If you're out of memory you kind of have to evict stuff, don't you?

I get the feeling you are kind of anti-Redis and I don't get why? Redis is a very cool project and could be useful for a lot of things.. It's not Redis' fault some people misuse it..

I am not at all anti-redis. Its not at all redis's fault it gets misused either; I say so that in the blog-post, even.

So why would you compress something that you can only decompress if its recently-reused?

How would you do mailinator with your strings in redis - and taking O(n) calls to redis to recover them to decompress an email where n is the number of lines (or consecutive lines, granted) in the email?

"I am not at all anti-redis. Its not at all redis's fault it gets misused either; I say so that in the blog-post, even."

Yeah, you actually do.. sorry.

"So why would you compress something that you can only decompress if its recently-reused?"

Not sure I understand your questions, and I've just started looking at Redis. But I guess you could do it the same way, but the added latency may make it infeasible. But the better answer is probably that you don't: You would modify the implementation to fit Redis' (or whatever) strength and weaknesses.

Trying to work out how fit mailinator into Redis rather than questioning if Redis fits into mailinator is exactly the cargo-cult cool-kids zombism I was ranting against, though ;)

You really can process mailinator- quantities of email with a simple Java server using a synchronized hash-map and linked list LRU and have some CPUs left over for CPU-intensive opportunistic LZMAing.

Trying to do it with IPC TCP ping-pong for each and every line though; well I'm not sure you could process mailinator quantities of email within any reasonable hardware budget...

Luckily you have a chance to see the error of your ways :)

Well, I never claimed Redis should be used for this, you are the one who asked how to do it.

But you have to remember that most people can't have important data in just one process; it's going to crash and your data is gone. The LMAX guys solved this in a cool way, but I wouldn't call it easy: http://martinfowler.com/articles/lmax.html#KeepingItAllInMem...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact