Hacker News new | past | comments | ask | show | jobs | submit login

Based on his response on Twitter I'd guess not https://twitter.com/dormando/status/1381706098290225152

But also agree with you that in that high concurrency usually not needed when using distributed caching. As I said in my tweet response (https://twitter.com/thinkingfish/status/1382039915597164544) IO dominates. And there's failure domain to consider. However, I've been asked a few times now about having the storage module used by itself or via message over shared memory in a local setting. That may very well present different requirements on cache scalability.




> However, if this design is used as a local cache over shared memory for a high throughput system (trading? ML feature store?) the scalability can be handy.

In that case, one might be concerned about the hit rates. While FIFO & LRU have been shown to work very well for a remote cache, especially in social network workloads, it is a poor choice in many other cases. Database, search, and analytical workloads are LFU & MRU biased due to record scans. I'd be concerned that Segcache's design is not general purpose enough and relies too heavily on optimizations that work for Twitter's use cases.

Unfortunately as applications have dozens of local caches, they are rarely analyzed and tuned. Instead implementations have to monitor the workload and adapt. Popular local caches can concurrently handle 300M+ reads/s, use an adaptive eviction policy, and leverage O(1) proactive expiration. As they are much smaller, there is less emphasis on minimizing metadata overhead and more on system performance, e.g. using memoization to avoid SerDe roundtrips to the remote cache store. See for example Expedia using a local cache to reduce their db reads by 50% which allowed them to remove servers, hit SLAs, and absorb spikes (at a cost of ~500mb) [1].

[1] https://medium.com/expedia-group-tech/latency-improvement-wi...


I might be dumb about estimating throughput. According to https://github.com/Cyan4973/xxHash, the best hash function can only do 100s M hashes per second, how can a local cache run at such throughput? I assume when measuring cache throughput, one need to calculate hash, look up, (maybe compare keys), and copy the data.


Are you comparing a single threaded hash benchmark to a multithreaded cache benchmark? An unbounded concurrent hash table has 1B reads/s on a 16 core machine (~60M ops/s per thread)


Oh, my bad, thank you for pointing out! BTW, just for my curiosity, where is the 1B reads/s benchmark?


A multi-threaded benchmark of a cache should be fully populated and use a scrambled Zipfian distribution. This emulates hot/cold entries and highlights the areas of contention (locks, CASes, etc). A lock-free read benefits thanks to cpu cache efficiency causing super linear growth.

This shows if the implementation could be a bottleneck and scales well enough, after which the hit rate and other factors are more important than raw throughput. I would rather sacrifice a few nanos on a read than suffer much lower hit rates or have long pauses on a write due to eviction inefficiencies.

[1] https://github.com/ben-manes/caffeine/wiki/Benchmarks#read-1...

[2] https://github.com/ben-manes/caffeine/blob/master/caffeine/s...


Thanks! I commented here and then realized it might be on Twitter. I appreciate you answering the same question in both places :-)


lol I'm glad you asked there since I am on HN about once every 3 months...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: