

How to Add Django Database Caching in 5 Minutes - Toddward
http://tech.yipit.com/2011/12/07/django-queryset-caching%E2%80%A6or-how-we-slayed-the-beast-known-as-mysql-joins/

======
justin_vanw
Or you can use a database where join performance isn't pathological, and scale
out by adding read slaves.

Caching is hard. Caching as an automated 'layer' is much harder. If it were
possible to cache in a general way _databases would do it already_. Adding a
'caching layer' is opening a gate into hell. The things is, at first it will
be fine. Thousands of engineer hours and hundreds of subtle bugs later, you'll
(if you're wise) realize that opening that door let out many demons, you just
didn't have the eyes to see them at the time.

"There are only two hard things in Computer Science: cache invalidation and
naming things." -- Phil Karlton

~~~
bjpless
MySQL does already automatically cache queries internally. However, the scale
of that caching is limited, in part because it's difficult to scale MySQL
horizontally in the same way as Memcached or other distributed caching
systems.

Adding read slaves is a great on many levels. Adding slaves, however, is not
without overhead and stale data/programming complexity exists there too.

~~~
famousactress
The caching that MySQL does effectively useless unless all of your queries
deserve equal priority (times their individual footprint) for memory. That's
almost never the case. I don't want relatively low-impact-rare-reads to take
up space that I'd prefer to use for high-impact-common-reads, for instance.

It's nice that MySQL caches queries, but it doesn't solve the same problem
that an application-level cache does.

------
fdintino
The timing of this post is funny, as I just got finished reworking our fork of
django-cache-machine. As the post points out, the limitation of Cache Machine
as it is currently built is that only objects which are already within a
queryset can invalidate that queryset. This is fine for selects on primary
keys, but beyond that the invalidation logic is incomplete.

My changes ( <https://github.com/theatlantic/django-cache-machine/> ) inspect
the ORDER BY clauses (if there's a limit or offset) and WHERE constraints in
the query and saves the list of "model-invaliding columns" (i.e., columns
which, when changed, should broadly invalidate queries on the model) in a key
in the cache. It also associates these queries with a model-flush list. Then,
in a pre_save signal, it checks whether any of those columns have changed and
marks the instance to invalidate the associated model flush list when it gets
passed into the post_save signal. We have these changes up on a few of our
sites and, if all goes well, we're looking to move the invalidation at the
column level to make the cache-hit ratio even higher.

~~~
bjpless
This is an interesting approach. I'm definitely going to check it out. Thanks!

------
zzzeek
Took a look at the Django Cache Machine they mention, at the invalidation
scheme (<http://jbalogh.me/projects/cache-machine/#cache-manager>). It stores
a "flush list", linking objects to their originating, cached queries.
Interesting, though looks like something that could get out of hand quickly -
are they storing the "flush list" itself in the cache (else how do all nodes
learn of the invalidation) ? That's interesting, though a little creepy (the
list gets very large as they appear to be keying on the SQL itself ?) Then
they have the invalidation flow along all the relationships - maybe that's OK,
but maybe it leads to a lot of excessive invalidation. Also they have a notion
of how to avoid a certain kind of race condition there, caching ``None``
instead of just deleting, but it's not clear to me how that helps - you really
need a version ID there if you want to prevent that particular condition (else
thread 1 puts None, thread 2 sees None, OK we'll put a new one there, thread
3, which started before all of them, doesn't see the None, puts stale data
in).

Really if you're caching SQL queries and such, you really, really should be
doing little to no modification of cached data - this library makes it seem
"easy" which it's not.

~~~
bjpless
Your point is well taken; however, as the writer of the article, I'd say this.
Our business requirements dictate that slightly inconsistent data is
acceptable in certain circumstances. We would not retrieve objects through the
caching flow in situations where absolute data integrity is required.

That points makes all the difference, in my mind. To your point about
excessive invalidation, that depends on your Read/Write workload.

~~~
zzzeek
well if I have a system where inconsistent data is OK, I just let the normal
cache expiration time logic handle that. Turning down expiration times to a
few minutes, even 30 seconds, can still take lots of load off of particular
"hot" objects that might be fetched multiple times quickly (such as from a
series of AJAX requests) while making it unlikely that large inconsistencies
will show up on the screen.

------
jmoiron
full disclosure: johnny-cache author

The top commenter in OP gave a great rundown of these projects and their
evaluation of them at YCharts at a Django NYC meetup a month-ish ago; I'm sure
his slides are available on the nyc django site somewhere.

All of these projects "automatically" manage cache for querysets, but they do
it different ways, and can be susceptible to poor performance under different
usage patterns.

From what I can tell, JC adds the lowest amt of overhead to cache misses and
hits, and uses the simplest (it's mildly sophisticated, but still
straightforward) management algorithm. It's the only one that works fine when
using UPDATE queries that do not mention row ids, and (as a result) is the one
that most greedily invalidates on writes.

The others are fine projects run by smart people, and depending on your site's
situation, I'd recommend some of them over johnny-cache. It's a good idea to
evaluate them all, as they certainly did at YCharts (his section on JC was
very accurate), and as OP seems to have done.

~~~
gtaylor
We use johnny-cache on a few projects with great results, thanks for your
work!

------
ceol
Thanks so much for linking to this post. I was just thinking how I would go
about caching QuerySets for a Django project.

