
Caching Antipatterns - ingve
https://www.hidefsoftware.co.uk/2016/12/25/the-caching-antipattern/
======
barrkel
The key problem with caching, not alluded to in this article, isn't that it
hides performance problems or is hard to invalidate or causes slow startups.
It's that it entangles state in a global way.

Responses to a request are usually dependent on a wide variety of different
internal states: properties of the user, cookies, state in one or more
databases often across multiple tables, or state in multiple dependent
services.

A cached value is a snapshot of a calculation over this internal state: but it
needs to be invalidated when any of the states that it is composed from
changes. What used to be a simple local calculation now becomes a global
concern; anything that modifies any of the underlying states needs to be aware
of the global state in the cache, and evict or invalidate cached items that
were dependent on the underlying state.

No longer is local reasoning enough. You need to have a global understanding,
possibly across multiple subsystems.

Caching worthy of the name - caching things that are complex to calculate in
time or space - inflicts serious architectural harm on a software system. It
impairs maintenance and understanding of the system, and will cause bugs when
people who don't have sufficient global understanding make local changes.

It's not all bad. Caching static assets, or a small well-understood layer
(like a disk cache), or designing calculations in such a way that dependencies
have immutable mappings from identifiers to values - there are ways to make
caching work. But it needs to be done very carefully in a large system or you
risk long-term damage.

~~~
kazinator
> _isn 't that it hides performance problems or is hard to invalidate ..._

> _... it needs to be invalidated when any of the states that it is composed
> from change ... need to have a global understanding, possibly across
> multiple subsystems_

:)

~~~
barrkel
"hard to invalidate" doesn't really capture the nature of the architectural
harm.

------
greenleafjacob
Not listed: temporal locality. A lot of naive caches inflate in size to
satisfy a high hit ratio but end up with low accesses per item. Etsy's mctop
is useful here although not possible if you're using e.g. Elasticache.

Not listed: persistent caches. A friend of mine ran a cache for a CDN. Their
caching solution mushroomed in complexity to satisfy the legal obligation of
taking down illegal content because their cache persisted to disk, so when
servers came back up from maintenance they had to check in with a master
revocation list. This made operations somewhat complicated because the system
was failsafe - an old server would refuse to startup if it detected that it
was older than the size of the revocation list (a 1 week KAFKA queue). Lots of
horror stories about zombie resurrected content being served before this was
implemented.

Somewhat alluded to: over reliance on cache. My former job on a high traffic
website was behind a fantastically fast high hit rate cache for almost the
entire site, necessary for the scale they were operating at. The trouble was
the cache shielded the system from so many requests it became a single point
of failure, so if the cache was flushed then the system would become quickly
overloaded and wouldn't recover the hit rate for an hour or so. We had to
invest a lot of time and energy in cache replication systems, consistent
hashing, and related systems - mostly careful choice of existing systems.
Shoutout to mcrouter by Facebook.

------
alkonaut
Article seems to be about (but fails to mention) web backends. Whether or not
caching things on startup is s good or bad idea is probably different between
a client and a server app.

~~~
wongarsu
I think the article holds equally well for clients and servers of all kind
(web or otherwise).

I guess you are alluding to the special case of single-page web applications
prefetching resources at startup? Here the article also holds true: if you can
you make your dependencies fit for purpose - making all resources small enough
to be fetched as necessary - or maybe you have to admit defeat to a dependency
you don't control: the user's internet connection.

Or were you thinking about a different scenario where clients are justified in
caching on startup?

~~~
alkonaut
I think it depends on what "caching" includes. But any time you can accept
zero UI delays later but can afford an initial delay of almost any length,
then caching as much as possible is usually the right decision.

In e.g a game scenario we might call this "precomputing" or "preloading" but
it's really equivalent to memoization and caching of any other kind (you load
the whole level rather than a part, you precompute trig tables rather than
wait for the first access of sin(123) and so on).

I make a thick (non-game) desktop app where quite a lot of data is computed on
startup because 500ms spent during a splash screen is ok, but a 500ms delay to
a user input is not.

Web or database requests - probably not - so I can't imagine any scenario
involving the web.

------
bagrow
My first (incorrect) thought when reading the headline was that they were
building a library of anti-patterns. Not sure how to do that, seems like a
challenge to data mine, but it would be very cool to have. Anyone know of such
a thing?

~~~
hathawsh
From the granddaddy of all wikis:
[http://wiki.c2.com/?AntiPatternsCatalog](http://wiki.c2.com/?AntiPatternsCatalog)

~~~
nilved
Somebody should add their new JS-based design to this list

------
nrjdhsbsid
Every time I've seen a cache in web applications it's because the underlying
system is poorly designed.

We have frameworks that can handle 1000s of requests per second on regular
hardware, that enough for a single machine to run 99.9% of the sites in the
internet.

Take WordPress for example. It's so dog slow that caching is basically
required, since even powerful servers can only manage 10-20 pages per second.

If WordPress was running Java or Go instead of PHP caching would be totally
unnecessary

~~~
dismantlethesun
Wordpress typically requires caching not simply because the PHP response time
is so, but because it puts out so many SQL queries that it'll overburden the
database. If instead of caching, you go to horizontal scaling on wordpress,
you'll quickly find that the database becomes the bottleneck.

~~~
nrjdhsbsid
Wordpress is its own beast with horizontal scaling too. It's quite difficult
to run a load balanced ep instance simply because it was never designed for
that. We used to runs ton of wp sites at my last place, and if the cache
wasn't enough the usual course of action was to throw it on a server/database
combo that was practically a supercomputer. PHP takes up way too much RAM per
thread and it's just slow and terrible to work with on popular sites :(

~~~
dismantlethesun
Was using PHP7 helpful? It has reportedly lower memory consumption.

