Hacker News new | past | comments | ask | show | jobs | submit login
Building a Cache in Elixir (openmymind.net)
88 points by revorad on July 12, 2022 | hide | past | favorite | 26 comments



One thing you can do is use :ets.slot/2 to do random probing of the cache and evict entries. The way to do so is simple, just have a process every few milliseconds check a random set of keys (say, 50 keys), and check if they expire. If >25% of keys were about to expire, repeat this process again instantly, until the % of keys expired in the batch is <25%, then poll periodically.

This is how Redis versions prior to 6.0 implemented key expiry. It's a very simple algorithm and can prove quite efficient at evicting expired keys without having to constantly scan the entire range.


How does Redis do it now in 6.0 and later?


Why not use a min heap to track expiry times? (Or a doubly-linked list, if lifetimes are uniform?)


The min heap would need to live in the process heap of a particular process. It would only be accessible by that process, so other processes would need to interact with it by sending messages to the "min heap" process and waiting for responses. The min heap process would became a massive throughput bottleneck at much lower QPS than ETS, just due to the overhead of processing messages. This would be exacerbated by GC pauses as the min heap grows larger than around 100k or 1M entries, whereas an ETS table can handle 100M entries or more (never hit a limit actually).

Large, mutable, shared state like a cache works much better if it lives in ETS. Any process can read/write to a properly configured table in a few microseconds. Scales very well to large data sizes and high QPS.

The random probing approach has the disadvantage that it stores expired entries longer than necessary. This either wastes memory or reduces cache hit rate. Either of these seem preferable to limiting throughput and maximum entry count, as the min heap would.


that was my immediate first question


> I've always been a firm believer that micro-optimization is a critical task for developers to undertake. In the worst case, it tends to be a great teacher. It's also only through such efforts that informed decisions, based on previously gained knowledge and facts, can be made about what truly is and isn't worth pursuing with respect to performance.

This why I like hobby projects. In my day-to-day work, my job is to deliver value and "get stuff done". However, in hobby projects, I can feel free to dive into an optimization rabbit hole and learn a whole bunch of stuff. Even if I don't ship my hobby projects, I still gain a lot from the background knowledge.


I don't really know Elixir (though I'm interested in it), but this blog post was incredibly readable. Large font, good syntax highlighting, utilized most (but all) of the laptop screen width.

A joy to look at in a sea of hard-to-read blog posts


On my mobile phone the font is stupid large though, making it impossible to read.


Good thing browser agents are just that, agents for the user. Most (if not all) have controls over the font-size, so you can make it larger/smaller at will, unless the website actively tries to defeat zooming, which this one doesn't.

For me, the font was also too large, but 0.5 seconds later, it wasn't. But no need to comment about something like this (same with the parent) as it's not actually about the article, just about something website specific.


How many people actually use a mobile browser to make things smaller. Usage is probably 1 percent. He has a valid issue


You can even get a simple cache running quite quickly with a Genserver and ETS.

About the post, I'm not sure what dcache offers over Nebulex though.


If you want a simple, easily understood cache and didn't want to write all the logic using GenServers and ETS you could use dcache.

Nebulex looks more complicated than dcache, just comparing the getting started info: "In order to give more flexibility and loading only needed dependencies, Nebulex makes all its dependencies as optional." Then it goes on to describing shards, decorators, telemetry and external adapters like Cachex and Redis. So before I can run `mix deps.get` to include this package I now have to make decisions or at least read thru what those things are for this package.

I prefer picking simple dependencies and add complexity later on if needed.


> I prefer picking simple dependencies and add complexity later on if needed.

That's exactly what having optional dependencies does, though. Except that when you inevitably realize that e.g. you need to monitor your cache-hit ratio in production, enablement for that means adding a suggested dep and then flipping a config flag to use it, rather than finding your own separate dep that does that thing, and then gluing the two "simple" components together using code that is both only maintained by you (rather than the upstream), and which must necessarily only touch the outside interface of both of the components, and so may be many times less efficient than code that hooks into one component or the other.


In my experience people get wooed by libraries that have a lot of configurations, settings, optional features, etc. and then rarely use them. Just pick a simpler tool that does specifically what you need at the time. Also, modularize it so your entire code base isn’t impacted by swapping out a cache tool (or http, etc.)


I do know what you mean, but the Elixir ecosystem is fairly exceptional in this regard. (It helps that Elixir is a compiled language with macros; many libraries implement optional-dep support via the compile-time-macro equivalent of the strategy pattern, where you only get the runtime code you ask for, rather than a bunch of useless calls to hooks that have no subscribers in your codebase. And if you're imagining that results in awful codegen code, it doesn't; in many cases, this is just as simple as putting an if statement directly into a module body, with method definitions inside the if branches.)

It's also helped by an eye toward low coupling of these support libraries. Elixir's Phoenix web framework, for example, is "batteries included" in some senses; but all those batteries are only included because your own generated skeleton code pulls them in; not because the framework itself depends on them in any way. So if you don't need e.g. i18n, or views, or really anything beyond an API that responds with JSON; then you can delete ~five lines from your skeleton project, drop the relevant deps, and the resulting project will be a lightweight web framework that just serves API calls. (This is why there isn't really a "lightweight" Elixir web framework ala Flask/Sinatra/etc.; Phoenix is both Elixir's batteries-included web framework, and its lightweight/minimal web framework, depending on how you use it.)


I agree but setting up a basic Nebulex cache is on par with most elixir "app wide" libraries.

But yeah I guess the docs can be a bit overwhelming at first glance.

Personally I like the decorators, especially as Ecto does not provide a cache solution.


Nebulex shines when you need a distributed cache. ETS is per-node so you'd need to handle synchronizing the data if you ever scale horizontally.

To keep it simple, you could rewrite this post using ex_shards as it handles scaling ETS (which IIRC is the backend by default for Nebulex).

That said, I was able to write a distributed session store with Nebulex pretty quickly. Then extended it to support live sessions, though it's not up to feature parity with its ETS counterpart if anyone wants to help out with my PR https://github.com/pentacent/phoenix_live_session/pull/14


The purging is a scan with `next()` and a `lookup()` for each item. Since the logic is a simple `<` comparison, could this be done with a single `match_delete` instead? (Unfortunately, ETS match specifications are quite wonky, so without a lot of fiddling myself, I can't suggest exactly what the comparison logic would look like, but I know that you can express certain things like comparisons with it.) That would cut down on the back-and-forth between the elixir process and the ETS process.


> Unfortunately, ETS match specifications are quite wonky

Matchspecs are gnarly! Very annoying to wrap your head around, and it's closer to being an AST than anything.

Shameless plug: I had written a library at one point to make matchspecs with Mnesia easier, but it might also work with ETS (haven't tested unfortunately): https://github.com/queer/lethe


Yes, match_delete would work. Even a little better, select_delete will return the number of entries that were purged. To delete everything where the third element in a tuple is greater than a constant, should be something like:

  :ets.select_delete(table, [{{:_, :_, :"$1"}, [], [{:>, :"$1", {:const, expires}}]}])


That's what `ets:fun2ms` at least in Erlang. Not sure how does that map in Elixir?


"nil is greater than any integer", that was astounding. Which languages have a no-data is bigger than max_int rule?


There is a total order on (almost) all data types in the beam. This is useful for arbitrary sorting and comparison of different types.

I say almost because floats and their equivalent integer are different items but they are equal in the total order. In practice this is fine, and I have never heard of this causing a problem.


For me, no data is either NaN or zero, but +infinity is was the astounding part.

Also, for some languages, not failing with an exception when comparing different strong types, could be a source of bugs, generally.


In Elixir, nil is an atom (a global enum namespace), it's not a special data type that has meaning outside of a convention (heavily supported by the stdlib) that "you should use it to mean nothing"[0]. To hammer this home, nil is meaningless and not generally understood by that convention in Erlang (which often returns undefined).

Elixir/BEAM is strongly typed. There is no coercion going on when you are doing comparisons.

[0] note that in elixir, "if" is in the stdlib, it's sugar over case, and the falsiness of nil is software-level (only false and nil are falsy in elixir, which is sane).


The conclusion here is, it’s probably not worth it unless you’re ok with reducing performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: