
Cluster? F#*k One machine is all you need - zomglings
https://medium.com/@simiotics/cluster-f-k-one-machine-is-all-you-need-c38105f3cb27
======
DLA
Agree 100%.

A 96 vCPU box with > 200 GB of RAM is ~$1.55/hour with an EC2 Spot instance.
Back that with a big SSD and you have a data processing monster. Use you some
nice Golang with well-formed goroutines to leverage all those cores and a damn
good many data processing tasks could be crushed on a single box for sure.

Metaphorically: Every gear you add to a machine (distributed this and that) is
a gear that needs to be cared for (configured, managed) and could break the
overall machine.

Simpler is better.

~~~
zomglings
My favorite recent infrastructure tool is Ansible. Really makes it easy to set
those boxes up and derive full utility from them.

Don't know if I hang around the wrong circles, but Ansible doesn't seem to get
much love.

------
mikhailfranco
_Scalability! But at what COST?_ (2015)

from the wonderfully opinionated Frank McSherry _et al_ ,

where 'COST' stands for _' Configuration that Outperforms a Single Thread'._

[https://www.usenix.org/system/files/conference/hotos15/hotos...](https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-
mcsherry.pdf)

P.S. By pure coincidence, McSherry's new company is also on the front page
today:

[https://news.ycombinator.com/item?id=22359769](https://news.ycombinator.com/item?id=22359769)

~~~
zomglings
Thanks for the link, I really like COST. Will reference this on technical
sales calls.

We've seen a few customers with complex data processing setups using Spark,
Airflow, etc. which we replaced with a single threaded python script with
better performance.

------
downerending
Indeed.

Don't use multiple processes if one will do.

Don't use threads if multiple processes will do.

Don't use some sort of multi-host framework if something dumb driven by ssh
(etc) will do.

