my approach to caching 1. cache expensive computation on read. 2. invalidate cac...

kevincox · on April 5, 2021

There are many problems that may arise although they not be too important for your use case.

- On write how do you know which entires need to be invalidated? Imagine data that is computed by multi-way joins. Any input that has changed may require invalidating all affected cache entries.

- Your cache is rarely strongly consistent with your database. So your update and invalidation will happen at slightly different times which can cause issues.

slt2021 · on April 5, 2021

thank you for great questions.

Re 1: cannot give you universal answer, but what I use is usually have multi-tiered cache: 1. global cache [using filesystem, or in memory db]

2. cache with scope limited to current user session [using webserver's session mechanism]

add more tiers if your app's architecture needs it (AZ level for example). Also you can use different cache eviction strategies for each cache level (manual eviction/lifetime eviction/LRU/etc).

once you have this, then it becomes easier to figure out what entries at what levels need to be invalidated. Hard thing becomes to maintain this logic, but it is a good trade-off since you get a great performance boost and scalability.

Re 2: UPDATE/DELETE to database usually is couple microseconds on most RDBMSs if using index lookup. cache invalidation is within the same time. In my use cases having this 4-5 ms window when cache entry is outdate is pretty good.