
Erlang Scheduler Details (2016) - StreamBright
https://hamidreza-s.github.io/erlang/scheduling/real-time/preemptive/migration/2016/02/09/erlang-scheduler-details.html
======
sb8244
This is a great post to know about if you do Erlang/Elixir professionally.
There was conversation back in 2016 that is still relevant today:
[https://news.ycombinator.com/item?id=11064763](https://news.ycombinator.com/item?id=11064763)

------
whizzkid
It is so underrated how easy to use scheduling with Erlang/Elixir. Plus
Supervisor/children concept makes it so easy to have a system that runs for
years.

~~~
latch
I'm a full time elixir dev (and fanboy) but don't agree with your second
point.

The only thing "easy" about erlang supervisors is having short-lived DB
network hiccup cascade up your supervisor tree and shutting down your app.
Clearly supervisors were designed for a set of problems which don't align with
what I imagine most Elixir devs are likely to run into.

A lot of people end up building either a special supervisor with infinite
retry + backoff, or bake the logic directly into their process, specifically
to avoid triggering the built-in supervisor.

If you want a system that runs for years, you're almost certainly going to
need an external language-agnostic supervisor (god, upstart, supervisord,
docker, ...) At that point, the built-in Supervisor advantage is...overstated.

What isn't overstated is the advantage isolated processes have when it come to
managing complexity. Fundamental shift in how technical debt can be made more
or less a non-factor.

~~~
dnautics
> is having short-lived DB network hiccup cascade up your supervisor tree and
> shutting down your app

Can you link me to a description of this? I am currently working on software
that expects DB tx on the order of 10/second (so I have literally never had
this problem), but am transitioning to a project where I expect to have
10k-100k tx/second. What causes the hiccup, and why is it not taken care of
with sensible defaults in the typical libraries (Ecto EG). I'd love to have an
ieda of what could be coming down the pike for me (to be defensive)

an external language-agnostic supervisor (god, upstart, supervisord, docker,
...)

So some of my coworkers are old C++ programmers that were FAANG hotshots and
imagine it possible to do everything in C++. They don't write unit tests, have
just learned go "because it's better" and don't know kubernetes. There is
literally no "high uptime/reliability" story in-house ATM.

~~~
latch
More generally:

A process (P) fails. Its supervisor (S1) restarts it. If (P) fails 3 times in
5 seconds (configurable,but no backoffs) then (S1) will fail. S1's parent
supervisor (S2) will now restart S1, which will restart (P) which might still
fail. If P fails quickly enough, you'll cascade your "restart max of X times
in Y seconds" all the way up to the application, the ultimate supervisor,
which itself will shut down after 3 failures.

For DBs specifically, DBConnection relies on trapping exit and reconnecting
with a backoff (it does not rely on supervisors), but how YOUR code deals with
(or, doesn't deal with) a failure can result in a cascade:

    
    
        defmodule MyProcess do
          use GenServer
    
          ...
    
          def something(data) do
             GenServer.cast(__MODULE__, {:something, data})
          end
    
          def handle_cast({:something, data}, state) do
            ...
            Postgrex.query!("lower case sql because we aren't monsters", [])
            ...
          end
        end
    

If your DB goes down, this process will crash when something/1 is called. If
something/1 is called at a rate greater than the supervisors are configured to
accept, it'll take down the app.

~~~
macintux
One useful piece of advice I recall, perhaps from Mahesh Paolini-Subramanya,
was to write defensive code for predictable errors, despite the happy path
coding that Erlang allows.

So losing connectivity to a database should not result in a process failure,
and thus is something that a supervisor should never have to deal with.

You’re right that there is no silver bullet internal to the language. Even the
BEAM itself can fail.

~~~
yellowapple
Better yet: add another layer of supervision. Erlang processes are cheap.
Might as well take advantage of that.

