Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Memcache at Facebook (2021) (micahlerner.com)
67 points by greghn on June 19, 2023 | hide | past | favorite | 20 comments



I never understood the diehard love for Redis when memcache always scaled better across nodes & was multithreaded (better utilizing on node resources)


Redis is simply more useful if you're not pushing max throughput using it purely as a cache. It has really useful support for a variety of data structures and operations, and I'd guess that Memcached's (previous) 1mb limit per item drove a good number of people to look for other options over time.


The 1MB limit was a big driver for a long time. Also redis had a persistence option.


Eons ago now (circa 2009/2010 I think), when working on Pornhub we switched from memcached to Redis for a few reasons:

- memcache had little to no observability, it made debugging cache issues a nightmare. Redis on the other hand is a dream to work with on that front, from the cli that is well thought out to the fact you can copy the db file to your machine and go at it, it's just a much better developer experience.

- Data structures, Redis' lists, hashes and sets meant we could represent a lot of a stuff as is in cache with no need for costly serializing. It also meant the data layer code was simpler than the memcached equivalent.

- Built in pub/sub, we used it for tons of stuff and the fact it came built in was big as prior to that we had to provision specific stacks to get the same functionality (and it wasn't has nice to use.

memcache has its use but for the little advantages it gave (perf wise) Redis just buried it in developer experience.


Because most of us use Redis for vastly more than a kv. Memcached is excellent if you only have basic cache needs.


redis is a swiss army knife. memcached is a single blade.


I've only used Redis a bit, and memcached not at all yet. But sometimes the superior alternative might lose out simply due to bad luck. I.e. that Redis gained traction and that was it. Or perhaps getting started with Redis is more easy/straight forward than memcached and that's why.

Or it could be any of the features that Redis has which memcached does not that contributed to this outcome.


- Redis offer much richer set of commands than memcached, making it easy to integrate

- Redis documentation is very good, likely leading to more adoption

- Most applications are not performance limited, but developer resource limited, making performance considerations secondary


Try harder ? Noone could explain to you though.


Memcache continues to get update and its latest release from June 2023 [1]. Compared to last time I checked, It now has new meta protocol, external flash storage by default, proxy mode [2], and many improvement in the networking core.

I vaguely remember their front page used to have Facebook as their sponsor. Considering that is no longer the case or I remembered it wrong. Does Facebook still uses Memcache?

[1] https://github.com/memcached/memcached/wiki/ReleaseNotes1621

[2] https://memcached.org/blog/proxy-intro/



Reading the article I didn't see it, but does FB have any innovation/papers/infra around "entity A changed, so cache keys x/y/z need invalidated"? Basically similar to Noria mentioned at the beginning of the post.


There is a diagram of a process that reads the MySQL commit log and uses that to send Memcache delete commands. Not exactly groundbreaking.

But I would also guess that FB would have to explain lots of their middleware concepts in order to explain how data in the database maps to memcache keys defined in middleware.

Personally, I think the hard part is ensuring the entire company/system accesses data through the correct abstractions (eg. the newest data model code) and does not bypass the data model code (eg. by hitting the database directly).


There are a few layers above this as well as safeguards to help make sure the right abstractions are used.


So the cache can be out of sync with the database before that log processing gets done with deleting?


The binlogs are applied first, then the deletes are emitted.

There’s also support for read after write consistency within a single cluster by adding a canary to the memcache key warning to read from the master replica after the write succeeds, that’s a really cool feature for a multi region eventually consistent system


Not sure about what's been published, but from what I recall, for data stored in memcache it was manual: people would need to know the right keys to invalidate when doing DB mutations (naturally there were functions encoding this, so multiple queries updating, say, a user could all just call a single function which returned the cache keys for that user).


That’s correct :)


This can be fairly gracefully handled by using a caching library with tag support. On a webpage that collates content from disparate parts of a CMS, you just tag the page content cache with the ID's of all the entities that make it up. Then you define a save/delete handler that purges caches tagged with the ID of whatever you just altered.


So what, if people want to call in “Memcache”, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: