

Show HN: Stash, a graph-based cache for Node.js and Redis - nkohari

Last night, while trying to figure out the best way to implement caching in my app, I had an idea for a dirt-simple caching system based on a dependency graph.<p>The premise I started with is that one of the hardest things to manage in a cache is dependencies between entities. In order to cache items effectively, you inevitably have to duplicate "child" data inside of "parent" entries. Then, when a child is changed, you have to invalidate the child and its parents, and its parents' parents, and so on.<p>To try to help this, I hacked together a simple Node.js library called Stash, which models the cached values as a graph. When you invalidate an item, Stash will walk the graph and invalidate any items that depend on the item that was marked as invalid.<p>The code is available here:<p>https://github.com/nkohari/stash<p>I'm not suggesting that this is by any means a revolutionary idea; it just started as a mental exercise and now I'm wondering if there's any value to continuing to improve it as a library.<p>I'm also interested in what you find to be the most difficult part about caching, and how Stash could be improved to help.<p>Any feedback is appreciated, but bear in mind this is just a few hours worth of work and it has quite a few rough edges. Thanks!
======
doug1001
My sense is that you are on to something. The reason i say that is that a
decades-old technique (though by no means primitive or out-dated) used in
operating system caches (and more recently in web apps, but far less often) is
a Bayesian filter. The motivation was not explicitly connected to resolving
dependencies (afaik) but rather the more general problem of how to exploit
prior state (last resource requested by client) to predict future, or next
state (the likelihood that a given resource will be requested next). Of
course, this technique can be implemented as a directed acyclic graph, or
"Bayesian Network". in any event, i like your idea, and i'm not aware of the
same technique having been applied to the same problem, but my knowledge of
server-side web dev is below the mean.

------
tlack
This problem has been bouncing around in my head since I first started using
memcache. Figuring out how to easily and correctly invalidate cache items is
an interesting problem.

Perhaps instead of explicitly defining dependencies, there could be some way
of modeling your cached data as some kind of set of objects which would then
understand their own graph. As you can see I haven't quite figured out how it
would work but this is how I've been thinking about it lately. Like a simple
ORM, in some way.

Something like the following terrible psuedocode:

    
    
      cached_comment(key) = new cached_item(function(key){ query("select * from comments where..")});
    
      cached_post(key) = new cached_item(cached_comment, function(key){query("...")});

~~~
nkohari
This was actually my original plan, but I took the easy way out for now. :)

We use MongoDB as our data store, and between the UUIDs used as primary
document keys and arrays of UUIDs representing links between documents, we
could (reasonably) easily determine the dependencies automatically.

------
AffableSpatula
I did some research on this a couple of years ago - the aim was to come up
with a mechanism that extended HTTP so any existing caching infrastructure
could pick it up and implement it. I wrote up a summary of the research here:
[http://restafari.blogspot.com/2010/04/link-header-based-
inva...](http://restafari.blogspot.com/2010/04/link-header-based-invalidation-
of.html)

Since then Mark Nottingham and myself have written this up as an internet
draft:

[http://tools.ietf.org/html/draft-nottingham-linked-cache-
inv...](http://tools.ietf.org/html/draft-nottingham-linked-cache-inv-01)

------
gsiener
Very cool. We've just started implementing OLAP cube type caching using sorted
sets in Redis and the speed gains are impressive. Checking this out now.

------
klahnakoski
I am unfamiliar with the general problem you are solving for, but I am
interested in graph dependencies in general. May I ask, How many nodes are in
a typical graph? Is there a possibility of cycles?

Thanks

~~~
nkohari
In this case, there would be one node for each cacheable item. In this case,
it's likely one node for each cacheable record in your data store, plus one
node for each cacheable collection.

Cycles are reasonably possible, if two cached items depend upon each other.
For example, if you store the text of the last comment a user made on HN
inside the "user" cache entry, and the user edited their most-recent comment,
you would have to invalidate not only the comment but also the user.

Stash treats dependencies as a DAG (directed acyclic graph). During traversal
when a node is invalidated, it's aware of the potential of cycles and won't
backtrack over paths its already examined.

------
villagefool
Where can an example for how to use such cacheing in when serving pages using
node be found? Sorry, first time I am hearing about this...

