
Why scaling and parallelism remain hard even with new tools and languages - andradinu
https://www.erlang-solutions.com/blog/the-continuing-headaches-of-distributed-programming.html
======
rdtsc
That's why I like Erlang and Elixir. They were built to handle concurrency
down to the core. Currently only languages on BEAM VM provide a set of mature,
built-in fault tolerance features (code reloading, isolated concurrency units
-- processes, immutable data, sending messages between processes instead of
acquiring lock).

I often see some frameworks which claim implement Erlang in "language $x" by
adding a queue to a thread. But that is still very much behind what Erlang
does, because it is missing these other components, namely fault tolerance.

Sure you can spawn OS processes, or spin up multiple machines/containers. But
that is not built-in, so have to manage the additional stack for that. Java
has code reloading, but it is not quite the same and so on.

And this is not just talk in generalities, these properties translate directly
to benefits and profits -- faster development time, less ops overhead (some
parts of the backend can crash and restart, without having to wake everyone
up), less code to maintain and dependencies to manage.

Just the other day had a typical distributed systems problem -- didn't add
backpressure and so messages were piling up in receiver mailbox. The simplest
solution I tested was just switch a gen_server:cast to a gen_server:call. It
was a 2-3 line change. Hotpatched on a test cluster and problem was fixed in a
few minutes. Ultimately I did something else, but the point is using a
language which was built in for concurrency is that it was just a couple of
lines of code. Had that been a custom RPC solution, with some serialization
and some socket code, it would have taken a lot more to write, test and
deploy. All that adds up quickly and can make or break the project.

~~~
heavenlyhash
The biggest question I have about this: Is it really a good idea to make local
actors and supervisor trees indistinguishable from remote ones?

Reusability is good.

Abstractions that are so "good" that they obscure potentially order-of-mag
performance differences -- I'm not sold. Whether an actor is in-memory with me
or across a TCP socket is order-of-mag, and may make the difference between
whether my program satisfies is service goals, or fails them abjectly.
Especially if that randomized latency happens a number of times, or in
unpredictable places.

Error handling paths that have to account for the Two-Generals problem being
indistinguishable from local errors I can be confident I'll hear about -- I'm
not sold on that, either.

I'd be interested to hear rebuttals to those from folks who have worked deeply
with OTP. Is it just... not that bad, in practice? Or are there parts of the
abstraction that specifically help counter these concerns?

~~~
pyotrgalois
In some way this article by one of Erlang creators is related to your comment:
[https://armstrongonsoftware.blogspot.com.ar/2008/05/road-
we-...](https://armstrongonsoftware.blogspot.com.ar/2008/05/road-we-didnt-go-
down.html)

> The fundamental problem with taking a remote operation and wrapping it up so
> that it looks like a local operation is that the failure modes of local and
> remote operations are completely different.

~~~
derefr
I think of Erlang less as making remote operations masquerade as local ones,
and more as forcing you to assume that all local code has the potential
failure-modes of remote code. Which, often (for weird and roundabout reasons)
it does.

~~~
im_down_w_otp
This.

------
jondubois
Highly scalable systems have to be designed in a particular way; it's an
architectural concern - Your choice of language might make it somewhat easier
to implement a highly parallel architecture, but the language itself will
never make it 'easy' \- Languages will always give you just enough rope to
hang yourself (that is the cost of flexibility).

You could design a framework which is extendable and scalable in such a way
that developers who want to write code on top of that framework don't need to
think (much) about parallelism (see
[https://github.com/socketcluster/socketcluster#introducing-s...](https://github.com/socketcluster/socketcluster#introducing-
scc-on-kubernetes)).

Unfortunately, you cannot build a highly scalable system without enforcing
some rigid constraints. General purpose programming languages do not enforce
restrictions on what design patterns you can or cannot use use; that is the
role of a framework.

That said, frameworks can never fully hide the complexity of parallel systems
(not for all use cases) but at least they can guide you to the best approach
when solving specific problems.

~~~
Retric
One of the problems is how far you can get with just one cheap server and then
one not so cheap server. 0 to 100,000 users and you can just ignore this
stuff. Then one day you wake up, the world is on fire and there are no simple
solutions.

At the same time trying to hit 100,000 users on a distributed platform is much
harder.

~~~
chii
When you do a projection of user growth, you'd be able to anticipate the curve
and prep ahead. Those caught off guard are either such a runaway success that
it doesn't matter, or don't do the growth projection and deserved the fail.

~~~
dorfsmay
Projections are hard.

Sometimes one successful marketing campaign after 10 or 100 failures will me
you go from 10s to several 100,000s users in A very short period of time. Not
enough time to rearchitecture.

------
RonanTheGrey
A big part of the issue is that distributed computing is hard to think about,
for the same reasons that multithreading are hard to think about: the single
threaded model (either on a local machine or amongst a set of machines via
something like RPC) is much easier to think about most of the time, even if
it's the wrong overall solution. Conway's Law only kicks in when
organizational issues FORCE distribution, and people are forced to think in a
distributed way.

It's an entirely different way of thinking about a problem, based entirely on
giving up control.

------
IgorPartola
Because it is not a tool or a language problem but a network reliability
problem?

~~~
stcredzero
Also, our current hardware has no good mechanism for low overhead/latency
communication between threads running on different cores. Instead, HFT
programmers have to hack something together like Disruptor and insert unused
arrays of longint to prevent false sharing.

~~~
CyberDildonics
> Also, our current hardware has no good mechanism for low overhead/latency
> communication between threads running on different cores

What is wrong with shared memory? Also if HFT programmers can do it, wouldn't
that imply that the hardware can do it?

~~~
millstone
Here's an illustration; this may be somewhat dated (not sure how this works
with e.g. shared L3 caches) but it should give you a flavor of why shared
memory sucks for messaging.

Consider the simplest example of a busy-loop, which is likely close to the
fastest you can get. Core1 sends a message to Core2 by writing to an address
that Core2 is repeatedly reading from.

Core1 first has to gain exclusive access to the address, by broadcasting an
invalidate on the bus. Core2 thus discards its cache line. Next Core1 writes
to the address by modifying the data in its L1 cache (or store buffer or
worse). Core2 then reads from the address. Since Core1 holds the modified
cache line, it is required to snoop the read. Core1 tells Core2 to wait and
then retry. This happens repeatedly until Core1's write lands in main memory.
Now Core2 can read the memory.

So:

1\. Messaging requires a round trip to memory, a bunch of cache line
invalidation, and other nonsense. Messaging involves multiple transitions to
and from shared state.

2\. Your CPUs already have a very fast message bus to implement MESI, but it
is unavailable to you as a client programmer.

It should be obvious from this that we could build a much faster CPU messaging
architecture if we cared to.

~~~
pcwalton
x86 CPUs use MESIF or MOESI, not MESI. Those protocols don't require round
trips to main memory.

------
njharman
Is it really that hard? Tons of companies do it. LAMP stack has been scalable
for couple decades or so. There are platforms and services that you can
rent/buy if you don't wanna role your own. I can get a free databricks account
and go to town with pyspark there's even free courses on it.

Threading is hard and I'm sure will always be hard. But threading is the wrong
way to scale or parallelise.

~~~
hbogert
LAMP scalable for a couple of decades? PHP is hardly usuable for two decades,
let alone scalable since that very beginning. Same for Mysql. Did you actually
ever try to scale Mysql? Not trivial. Did you every try to scale webserver
with PHP where sessions are being used? Not trivial.

~~~
Thaxll
Yes it's called sharding for MySQL.

~~~
duaneb
If you dislike joins and love data inconsistencies, go for it.

~~~
kod
Joins are still possible in sharded relational models. I wouldn't want to roll
my own on mysql, but there are plenty of data stores that use this model
(vertica, citus, etc)

~~~
duaneb
> Joins are still possible in sharded relational models.

For some definition of joins, this is true. For a SQL-oriented version, I
don't think it's possible, let alone easy, to run arbitrary joins across
sharded data. Can you write subqueries that cross shard boundaries? How about
aggregate functions? I would be (very pleasantly) surprised if this were
possible outside of directly calling the aggregation/join.

> I wouldn't want to roll my own on mysql, but there are plenty of data stores
> that use this model (vertica, citus, etc)

Yes, and none of them have succeed at scaling mysql without sacrificing
heavily in functionality. I don't believe foreign keys are supported cross-
shard anywhere. I'm also pretty sure the shard schemes heavily dictate which
operations are efficient and even possible, and which are not, so you really
need to design around scaling to begin with _anyway_. So—yes, you can scale
MySQL, but avoid using any of the features that make MySQL worth using, except
intrashard.

~~~
kod
None of the stores I mentioned use mysql. I'm not talking about scaling mysql,
I'm talking about scaling relational databases, specifically the comment about
joins.

Joins arent a big deal, just join on (at least) the shard key, or for smaller
hash joins ship the small side of the join.

If you have multiple kinds of joins keep multiple copies with different shard
keys.

Yes, (monoidal) aggregations are possible with this, yes subqueries are
possible with this.

