One thing you can do is use :ets.slot/2 to do random probing of the cache and evict entries. The way to do so is simple, just have a process every few milliseconds check a random set of keys (say, 50 keys), and check if they expire. If >25% of keys were about to expire, repeat this process again instantly, until the % of keys expired in the batch is <25%, then poll periodically.
This is how Redis versions prior to 6.0 implemented key expiry. It's a very simple algorithm and can prove quite efficient at evicting expired keys without having to constantly scan the entire range.
The min heap would need to live in the process heap of a particular process. It would only be accessible by that process, so other processes would need to interact with it by sending messages to the "min heap" process and waiting for responses. The min heap process would became a massive throughput bottleneck at much lower QPS than ETS, just due to the overhead of processing messages. This would be exacerbated by GC pauses as the min heap grows larger than around 100k or 1M entries, whereas an ETS table can handle 100M entries or more (never hit a limit actually).
Large, mutable, shared state like a cache works much better if it lives in ETS. Any process can read/write to a properly configured table in a few microseconds. Scales very well to large data sizes and high QPS.
The random probing approach has the disadvantage that it stores expired entries longer than necessary. This either wastes memory or reduces cache hit rate. Either of these seem preferable to limiting throughput and maximum entry count, as the min heap would.
> I've always been a firm believer that micro-optimization is a critical task for developers to undertake. In the worst case, it tends to be a great teacher. It's also only through such efforts that informed decisions, based on previously gained knowledge and facts, can be made about what truly is and isn't worth pursuing with respect to performance.
This why I like hobby projects. In my day-to-day work, my job is to deliver value and "get stuff done". However, in hobby projects, I can feel free to dive into an optimization rabbit hole and learn a whole bunch of stuff. Even if I don't ship my hobby projects, I still gain a lot from the background knowledge.
I don't really know Elixir (though I'm interested in it), but this blog post was incredibly readable. Large font, good syntax highlighting, utilized most (but all) of the laptop screen width.
A joy to look at in a sea of hard-to-read blog posts
Good thing browser agents are just that, agents for the user. Most (if not all) have controls over the font-size, so you can make it larger/smaller at will, unless the website actively tries to defeat zooming, which this one doesn't.
For me, the font was also too large, but 0.5 seconds later, it wasn't. But no need to comment about something like this (same with the parent) as it's not actually about the article, just about something website specific.
If you want a simple, easily understood cache and didn't want to write all the logic using GenServers and ETS you could use dcache.
Nebulex looks more complicated than dcache, just comparing the getting started info: "In order to give more flexibility and loading only needed dependencies, Nebulex makes all its dependencies as optional." Then it goes on to describing shards, decorators, telemetry and external adapters like Cachex and Redis. So before I can run `mix deps.get` to include this package I now have to make decisions or at least read thru what those things are for this package.
I prefer picking simple dependencies and add complexity later on if needed.
> I prefer picking simple dependencies and add complexity later on if needed.
That's exactly what having optional dependencies does, though. Except that when you inevitably realize that e.g. you need to monitor your cache-hit ratio in production, enablement for that means adding a suggested dep and then flipping a config flag to use it, rather than finding your own separate dep that does that thing, and then gluing the two "simple" components together using code that is both only maintained by you (rather than the upstream), and which must necessarily only touch the outside interface of both of the components, and so may be many times less efficient than code that hooks into one component or the other.
In my experience people get wooed by libraries that have a lot of configurations, settings, optional features, etc. and then rarely use them. Just pick a simpler tool that does specifically what you need at the time. Also, modularize it so your entire code base isn’t impacted by swapping out a cache tool (or http, etc.)
I do know what you mean, but the Elixir ecosystem is fairly exceptional in this regard. (It helps that Elixir is a compiled language with macros; many libraries implement optional-dep support via the compile-time-macro equivalent of the strategy pattern, where you only get the runtime code you ask for, rather than a bunch of useless calls to hooks that have no subscribers in your codebase. And if you're imagining that results in awful codegen code, it doesn't; in many cases, this is just as simple as putting an if statement directly into a module body, with method definitions inside the if branches.)
It's also helped by an eye toward low coupling of these support libraries. Elixir's Phoenix web framework, for example, is "batteries included" in some senses; but all those batteries are only included because your own generated skeleton code pulls them in; not because the framework itself depends on them in any way. So if you don't need e.g. i18n, or views, or really anything beyond an API that responds with JSON; then you can delete ~five lines from your skeleton project, drop the relevant deps, and the resulting project will be a lightweight web framework that just serves API calls. (This is why there isn't really a "lightweight" Elixir web framework ala Flask/Sinatra/etc.; Phoenix is both Elixir's batteries-included web framework, and its lightweight/minimal web framework, depending on how you use it.)
Nebulex shines when you need a distributed cache. ETS is per-node so you'd need to handle synchronizing the data if you ever scale horizontally.
To keep it simple, you could rewrite this post using ex_shards as it handles scaling ETS (which IIRC is the backend by default for Nebulex).
That said, I was able to write a distributed session store with Nebulex pretty quickly. Then extended it to support live sessions, though it's not up to feature parity with its ETS counterpart if anyone wants to help out with my PR https://github.com/pentacent/phoenix_live_session/pull/14
The purging is a scan with `next()` and a `lookup()` for each item. Since the logic is a simple `<` comparison, could this be done with a single `match_delete` instead? (Unfortunately, ETS match specifications are quite wonky, so without a lot of fiddling myself, I can't suggest exactly what the comparison logic would look like, but I know that you can express certain things like comparisons with it.) That would cut down on the back-and-forth between the elixir process and the ETS process.
> Unfortunately, ETS match specifications are quite wonky
Matchspecs are gnarly! Very annoying to wrap your head around, and it's closer to being an AST than anything.
Shameless plug: I had written a library at one point to make matchspecs with Mnesia easier, but it might also work with ETS (haven't tested unfortunately): https://github.com/queer/lethe
Yes, match_delete would work. Even a little better, select_delete will return the number of entries that were purged. To delete everything where the third element in a tuple is greater than a constant, should be something like:
There is a total order on (almost) all data types in the beam. This is useful for arbitrary sorting and comparison of different types.
I say almost because floats and their equivalent integer are different items but they are equal in the total order. In practice this is fine, and I have never heard of this causing a problem.
In Elixir, nil is an atom (a global enum namespace), it's not a special data type that has meaning outside of a convention (heavily supported by the stdlib) that "you should use it to mean nothing"[0]. To hammer this home, nil is meaningless and not generally understood by that convention in Erlang (which often returns undefined).
Elixir/BEAM is strongly typed. There is no coercion going on when you are doing comparisons.
[0] note that in elixir, "if" is in the stdlib, it's sugar over case, and the falsiness of nil is software-level (only false and nil are falsy in elixir, which is sane).
This is how Redis versions prior to 6.0 implemented key expiry. It's a very simple algorithm and can prove quite efficient at evicting expired keys without having to constantly scan the entire range.