
Homemade Analytics with Ecto and Elixir - lobo_tuerto
https://dashbit.co/blog/homemade-analytics-with-ecto-and-elixir
======
dnautics
Great strategy, I love how elegant it is, but I do wonder about two things
regarding this method:

Testing it (especially with async integration/e2e tests, which I prefer) would
be tricky, I'd want some tests to verify that the "GenServer write cache" is
more or less counting correctly under some (potentially stressful) scenarios
that I can stress out by having other things run in the VM.

From what's presented, I'm not sure yet how to get this to work with
distribution, in the case that you would want to drop this into a distributed
phoenix setting. Do you start the pages :global instead of registered? Do you
start a Registry with :global, and have the cluster restart it if vm that
hosts it dies?

~~~
josevalim
Distribution is not necessary. Each server/node will have its own set of
processes doing upserts, which are atomic.

The testing story from Elixir is relatively straight-forward too. I send a
bunch of requests and then ask the supervisor to shutdown, which forces all
processes to terminate and perform their pending writes.

~~~
dnautics
> Each server/node will have its own set of processes doing upserts, which are
> atomic.

Perfect! thank you for clarifying. Or maybe i should have read it more
carefully.

> The testing story

I keep forgetting ExUnit's start_supervised! exists.

------
losvedir
Very interesting! Thanks so much for sharing. I have a couple questions:

1) The `bump` function interacts with the worker GenServers via `send(pid,
:bump)` and the GenServer has a `handle_info` for that. I would normally add a
client function to the worker module which uses a `GenServer.call(...)`
instead. Was that just to make the blog post a bit shorter, or are there
tradeoffs I'm missing?

2) In the case of millions of pages, the post suggests terminating the
processes. What about `hibernate`? Could that be an intermediate approach to
reach for, for some relatively large number of pages?

~~~
josevalim
1) I wanted it to be async as there is not much reason to wait for the process
since the database writes are async too. So either send/2 or cast/2 would be
fine.

2) Good call on hibernate. It could definitely be an option to consider on all
implementations.

------
udfalkso
Could you perhaps have one process for all paths that just contains a map with
the counts and receives messages to increment? And then that map gets
periodically persisted to the db as a bulk update? Is a separate process per
path needed?

~~~
dnautics
strictly speaking it's not needed, however, you probably don't want to do that
because raw erlang processes are single threaded with no shared memory (but
also no transactional locking, mutexes, or channels to worry about!) so at a
certain scale you will become bottlenecked on that one process. ETS tables (an
internal k-v lookup system) cheats a bit on the "no shared memory" rule, and
Registry is backed by that and able to be a high-performance solution for
sharding your path to the process that needs to handle.

------
lysium
Very educating! I was surprised that there was the possibility of a race
condition where it starts two processes for one path.

------
brightball
That’s a great educational read on Dynamic Supervisors and Registry.

