Hacker News new | past | comments | ask | show | jobs | submit login
Building Resilient Services with Go (fogcreek.com)
110 points by GarethX on Aug 6, 2015 | hide | past | favorite | 19 comments

There's a lot there that looks familiar there to the Erlang developer, where many things are built in.

systemd + Docker + Go is basically re-creating Erlang from the ground up, except half broken and not learning from Erlang's 20-30 year history. Now we're training people in the wrong ways to create maintainable distributed systems even when we know better ways to do things. I cry for our industry.

It's been said Greenspun's 10th Rule, when applied to distributed systems, results in Erlang.

Honest question, from one who doesn't want to re-learn the lessons of the past and who would really appreciate a 2 to 3 decade boost.

What are the lessons of Erlang, and how do you abstract as much of that as possible away from the language/ecosystem to others? I'm looking for an answer a little more accessible than "you have to learn Erlang to find out."

Joe Armstrong's thesis, ( https://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf ) has some good stuff.

Part of what makes Erlang Erlang is that a lot of this stuff is integrated very deeply into the language and its libraries. It's not just something you can bolt on in many cases.

The parent poster asked for something "a little more accessible" than learning Erlang, and you responded with a 295-page doctorate thesis. Is there something else that one might have a chance at grokking in a single evening?

Perhaps not readable in a single evening, but this is probably the most accessible thesis I've ever read. If you don't care about the history or motivation behind Erlang, just skip to chapter 3.

However, I think the "why" of Erlang is pretty compelling in and of itself, especially if you're interested in extremely high-availability telecom hardware with an operational life measured in decades.

i'll second `tmm`'s comment. i skim-read Armstrong's Erlang thesis a while ago, it is very readable. if you invest a few hours one evening in reading it you can probably get a good overview of what is important, and which parts you might like to learn about next in more detail.

Erlang isn't hard to learn.

And Elixir is maybe easier. I believe the documentation is making strides, but last I checked there were some rather opaque references to Erlang OTP, and other things that aren't exactly obvious.


What are the lessons of Erlang

Joe gives good talks. Here's one he recommends as an intro to the capabilities and philosophy of Erlang: https://www.youtube.com/watch?v=u41GEwIq2mE

The entire Erlang thesis falls out of one simple fact: for a program to be highly available, it must run on more than on piece of hardware.

When you run on more than one piece of hardware, you need to be able to communicate between pieces of hardware, detect failures, and when something fails you need to be able to restart it (hacks: systemd, SMF) or reassign responsibility to a different piece of hardware that hasn't failed yet.

Monitoring and restarting in itself isn't simple. What if what you're monitoring is on another physical machine? You need built in networking and for simplicity, you need built in location transparency. If I start process Bob on machine A and process Kat on machine B, I should be able to interact with Bob on either machine without specifying where it's running (i.e. RMI that works).

And all of this is the Erlang VM so far—nothing to do with the language itself. The language adds nice features like pattern matching and atoms and tuples and cons cells and binaries and bignums, but those features are independent of the "Erlang-ness" of the runtime.

There are a lot of great articles written about rationales behind all these topics and a lot of great talks up on YouTube. Just search YouTube for Joe Armstrong talks and/or general Erlang intro talks. You'll soon see how the basics of Erlang are being re-implemented in 2015, except half thought through, and it'll take another 10 years for these ad-hoc remade components to become "I'd bet the farm on it" reliable.

Here are few lessons of Erlang (mostly stolen from Erlang Factory talks, books Learn You Some Erlang, Joe's, etc).

* L1: Today we are stuck building distributed and highly concurrent systems. We don't have a choice. Single CPU, single machine, where speed doubles every 18 month era is in the past. Because of the internet and lot of data most systems today are distributed. (How many startups do you know that ship a standalone desktop program these days?)

* L2 : Distributed and concurrent systems to be useful have to be fault tollerant. We don't want a segfault or panic caused by one corner case, triggered by one client, to bring down the rest of the server and the rest of the million connected clients. That would be a horrible page/phone call to get at 4am

* L3 : For the systems to be tollerant they have to be built out of isolated components, such that when they fail that failure doesn't spread through-out the whole system.

* L4 : Isolation can be achieved in a few ways:

- A runtime system that prevents sharing memory -- OS processes do this. Erlang's VM does this as well, except you can have millions processes on a single machine.

- Proving memory won't be shared. Rust's compiler can do this. It can prove at compile time that you won't have data races between your threads. Rust is the best and most interesting new thing in languages in the last decade probably.

- Running in a container, VM or a different machine. At the end of the day of course, if your service is running on a simple (non-mainfraim-y) single machine, it is not fault tollerant.

* L5 : These isolated concurrent units also send messages to each other to communicate (instead of say reading from a shared memory with a mutex thrown in there some place). These units are often referred to as "Actors" and there is a whole class of frameworks, libraries that implement that besides Erlang (Akka, Orleans, etc.)

* L6 : Erlang in addition to these basic blocks also comes in with:

- Functional programming approach. Your data and variables are immutable. So it is easy to look at a piece of code and understand what is being changed. State updates are very explicit. And that is on purpose.

- A framework of patterns used to build/monitor/distribute these concurrency units. This is called OTP.

- Monitoring and debugging capabilities. You can connect to a running VM node, inspect, trace, debug even live code update without stopping the system.

- An decades long ecosystem and experience building these kind of systems.

* L7 : If you are afraid of non-curly braces syntax and do not like typing , instead of ; take a look at Elixir. It is a new language but runs on the Erlang VM.

Hopefully this helps!

That's a pretty good list. It'd be interesting to really expand that into a lengthly article or short book or something.

Thanks, that really did.

Well, not quite. systemd and all other process managers supervises OS processes, Erlang only supervises its own VM processes which are green M:N threads. Though one could in principle supervise an OS process as a port driver, you will need to wrap POSIX/Win32/$PLATFORM semantics to do it to the fullest extent.

OS-level virtualization is also completely orthogonal.

Conceptually it is still building a system of concurrent units, which are also isolated from each other's memory space.

Erlang's VM is like a mini OS for your backend system except that it can handle a million isolated processes while the OS can handle a few orders of magnitude less.

Do you have anything more specific to add? This comment doesn't tell me, a non-erlanger, anything other than that you like it more than systemd + Docker + Go. But clearly there are numerous success stories with this very stack (or some permutation thereof).

Erlang programmers have thought about how to solve these problems a lot longer than Go's, for example. The problems being solved today aren't appearing for the first time. You don't see blog posts written about how to do X in Erlang because X has likely been thought about carefully for more than a decade, a best solution to do X likely already exists. Erlang programmers today can worry about building their applications instead of the most fashionable way to do X in Y language.

The latest problem Erlang is addressing is regarding leap-seconds and how they can cause all kinds of strange behavior in applications. In Erlang, this is called Time Warp [1][2]. I have yet to see another language try to tackle time correctness, since most are still too busy ironing out the performance of their garbage collector or how to do concurrency.

[1]: http://www.erlang.org/doc/apps/erts/time_correction.html

[2]: http://learnyousomeerlang.com/time

The only other example that comes to mind (outside of ntpd), is the TeaTime protocol used for the experimental distributed VR system (Open)Croquet implemented in Squak/Smalltalk: https://lwn.net/Articles/124712/

But if IRCC the whole premise of TeaTime is to not actually do time "correctly", but rather aim for "mostly predictable ordering of events".

Maybe it's time to dust off the old Croquet ideas, and mix the oculus rift with a platform based on Elixir?

Another interesting Go deployment. However, although not about that article, I think HN readers might like this post by the same team on how they do code reviews:


I discovered it while reading the parent article. I think it's a good checklist that will contribute to resiliency better than a language choice. Of course, adding a language with strong type and memory safety to such a good development process will certainly drive quality up from there. Efficiency, too, as recent article indicate.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact