I think you're talking past each other a bit here. Nothing the parent is discuss...

m0zg · on Nov 1, 2019

I'm actually serious about that last bit. If your distributed system relies on guarantees it _does not have_ in order to operate correctly, one would be well advised to stay away from it.

ownagefool · on Nov 2, 2019

You can be super serious about it all you want, but it's accusatory and doesn't reflect the post you responsed to, which referred to performance and user experience implications with shooting nodes in the head.

Personally I think there was the formation of a really good debate. As you said, if you drain the nodes traffic before killing it, you're probably right, any costs associated with maintaining consistency is probably saved by the human aspect of just not waiting for a clean shutdown of processes at scale.

But when you throw in the sass, people stop listening and we all get dumber for it.

m0zg · on Nov 2, 2019

But there's no "debate" here to be had. All the high scale companies (Google, FB, Amazon, Microsoft, Netflix, others) do not rely on their distributed system nodes being able to wind down in an orderly fashion. Shit, Netflix and Google (and likely others as well) stage fault tolerance exercises, taking random nodes (or entire datacenters) out of rotation and checking if things still work. There's no way to get to five nines if you expect your program to always behave.

Here's one from Netflix that will give you an ulcer: https://github.com/Netflix/chaosmonkey

Here's what Google does: https://www.usenix.org/conference/lisa15/conference-program/...

>> we all get dumber for it

Not _all_. Only those who feel inclined to reject the obvious.