Hacker News new | past | comments | ask | show | jobs | submit login
Erlang and First-Person Shooters (2011) [pdf] (erlang-factory.com)
195 points by Tomte on April 16, 2017 | hide | past | favorite | 41 comments

I saw this deck mid-2014, and it single-handedly made me want to learn Erlang. I felt that any system that can handle COD's load is, at the very least, worth checking out.

About a year later, I got a job doing Erlang full-time, and OTP is about as good as that slideshow indicated...So, if anyone at Demonware is reading this...Thank you! You introduced me to my second great love (after my wife)!

How did you get into erlang, like in project wise?

Because I played around with rabbitmq plugins, a nehe opengl examples port and some erlang notepad someone implemented. I also own both o'reilly hard copies.

But other then that I don't really have a project I can see myself use erlang for(one man shop).

The Cowboy webserver is fairly easy to get started with and I really recommend checking it out, since a web server feels like a "very Erlang problem" to me.

To answer your question though, I got started by playing with TCP and building a basic chat server. I spawned a new Erlang process for each user, used Erlang's internal messaging to share messages between processes, and then passed those messages along to TCP to the user.

A basic rule of thumb on whether or not it's a good Erlang problem is seeing if you can visualize the problem in terms of a bunch of tiny, self contained things (like cells), and you don't want one of those cells failing to have any risk of breaking anything else.

And it turns out a lot of problems actually fit that if you try to think about them that way. Sometimes it's worth -trying- to change your visualization; you'll find it works out better.

We had to write task scheduling software where I worked, and someone unfamiliar with Erlang said something to the effect of "Oh, that's easy, priority queue". Well, no; there are all -sorts- of sharp edges to doing that. Instead we just had one Erlang process per task, with timers (slight gotcha in that you shouldn't use the timer module, instead erlang:send_after). So every scheduled task in the next (time period) was a sleeping Erlang process (and was listening for updates, or for the timer message; on the timer message it would go and actually start executing). Super easy to test, reason about, debug, etc (because the correctness of the task execution was largely isolated from the correctness of the task scheduling) and a far better solution than what the non-Erlang mindset would naturally reach for (since in most languages, more concurrency = bad).

I'm just curious, what are some of the disadvantages to priority queue?

Okay. Let's start with a perfectly naive priority queue. We have a default data structure (nevermind the implementation for now), and a thread that pulls items off the queue, and executes them. Works perfectly in test.

But we notice something; our executions take a non-zero amount of time, and so if we have two events happening simultaneously, they don't both get fired off in time. So now we have to switch to a thread pool, that gets handed events from the priority queue. Task runners, in effect. So, great. Everything works now.

But then we have to start accepting updates to tasks. That is, we are able to move a task forwards or backwards in time. Now, suddenly, we care about the underlying data structure of the queue; can we both be updating the order of events -and- be pulling from it? Does it require locking to do that?

Maybe it does require locking, in which case are we certain that we got ~that~ implemented right? Not just the queue library, but, in the event of an exception in our code, we correctly free the lock? And how long does it take to rebalance the queue? Is that acceptable, since we can't be pulling from the queue during that time?

Maybe it doesn't require locking, such as when using immutable data structures where we build a copy and swap it out, but then we might have a chance of executing a task twice (i.e., our task launcher thread reads copyA, while we're modifying copyB, and so the head of the queue in both gets executed), so now we have to keep tabs on what we launched last. But that assumes a stable priority queue in the event of two items having the same time; -that's- a fun edge case to miss.

But maybe we get all that working, by getting a lock-free priority queue in place. We have one thread reading from a priority queue, and handing tasks off to a thread pool. What size is the thread pool? What happens if a task is long running? Can we exhaust the thread pool?

What happens if we need to scale out a bit, because we're exhausting our thread pool? We only have one thing reading off the queue; can it hand those off to another machine in a timely enough fashion? Probably not, especially if we're getting GC pauses; how do we solve that? Maybe we change it so that we launch things a little early, either locally or remotely, and the thread sleeps until it's time to start the task. That will work, but what happens when we want to update the task? Now the logic exists in two places; the task could still be on the priority queue in which case we have to modify the queue, or the task could be sitting in its own thread, and we have to modify it there (arguably this existed before; if a task had started you modified it in the thread it was executing in, if the task hadn't started you had to modify the priority queue, but this makes it far less determinate).

Etc. In Erlang, you skip past all of these issues, and implement the end solution as your first, rather more easily. There's still a data structure somewhere saying what tasks have to execute, and when (probably just in your database, though, nothing needed in memory), but you don't have to worry about managing it beyond "hey, make sure the items for the next (time period) are running in a thread (Erlang process)". You can do that with one process launching the others, and (naive) distribution across machines is incredibly easy (the same complexities of handling partitions apply to both solutions). Individual tasks are built from the beginning to handle sleeping until start time, and updates are trivial, being just a message to them that they respond to, and a modification of your data structure (database). There's no tight coupling between the two. You end up with just "whenever a change in events happens (creation or update), send a message to that process if it exists, and update the DB". And the process scheduler in Erlang handles all the concurrency. There's no locking you have to worry about, and latency will be spread equally, rather than running the risk of choking out a single task.

And this is the -naive- Erlang implementation. That's the benefit. For this particular use case, with priority queues you start running into issues, and as you solve them, your implementation looks more and more like the naive Erlang implementation.

Thank you so much for the write up, I'm just getting into programming at school and this was a fantastic and interesting read! I've always been interested in how data structures work in real-life applications and I'm thankful for the explanation.

Got a good book or website recommendation for learning Erlang?

How does it compare/contrast with something like Akka (which I've got rudimentary experience with in Scala)?

Akka was basically the Scala implementation of Erlang's actor model.

The key differences are, for me, Erlang is easier to reason about (this is mostly how my mind works; Akka it felt like everything was reactive, that nothing is doing any work until it gets a message; Erlang it felt like the opposite, that every process is doing work until it decides it needs a message, and then it can check its mailbox, and optionally block until something ends up in it), and because of its functional nature, things that require special keywords in Akka, just fall out 'for free' in Erlang. For instance, 'become'. 'Become' is just calling another function in Erlang, that has different behavior. Erlang also has more reliability characteritics (not as efficient, but, no stop the world garbage collection, shared nothing, so the only way an exception in one actor can affect another is via an OOM, or specifically hooking up links/monitors such that it causes it to affect the other actor). Akka, however, gives you the speed of the JVM (which -is- better, in pure number cruncing, than Erlang), more libraries (though Erlang is no slouch there, and they all kind of fit the Erlang model), and OOness, if that's your thing.

If you could share that code it will help a lot of people to see the power of Erlang as perceived by a novice coder in the language.

I sadly do not have access to that code anymore. I wrote a basic MVC framework using the aforementioned Cowboy server early last year though: https://github.com/Tombert/Frameworkey-Erlang

... Keep in mind I've gotten a lot better since I wrote this!

From a gamers perspective, the peer-to-peer lobby system used in the CoD series is one of the worst things about it.

I can remember constantly having to migrate hosts, getting owned by the host because everyone else has a ping much higher than zero, and horrible lag when the host it chose had a bad connection.

Cool technology, but at the cost of fun.

Eh, I played CoD for a long time in middle and high school The community loved to complain about the whole lack of dedicated hosting thing but I rarely had bad experiences with it. Plus, CoD 4 and CoD WaW both had dedicated servers but are mentioned in this slide deck, so it sounds like they'd be using this middleware regardless.

At CoD scale I'm sure the ability to not rely on first party hosted servers saved enough money to justify the occasional service interruptions, and this system is better for groups of players that might be close to each other but far from an official data center.

yeah, I remember hearing that. I wouldn't necessarily consider that a sleight against Demonware though, considering that it's just a really hard problem to solve. Also I'm sure Activision pushed hard for it as it meant much reduced server costs.

As they say, when they first started with COD they had tons of scaling issues because they had never had so many before, but Erlang caused relatively few of their issues.

It should probably be mentioned that I am in Australia, where the majority of people have less than 12Mbps down, and even more likely less than 1Mbps up

I wonder if they have considered moving to elixir since some of the problems they have highlighted are resolved in elixir.

That would be easier for new employees and they could still keep their Erlang code base until they make the full switch.

I'm wondering the same :)...

In 2011 Exlixir didn't exist. More than likely they've stuck to Erlang as the incremental benefit is really on the language side only. If you're only moving to Elixir because of the language sugar, the only thing you're going to gain is maintenance and updates -- with a system that probably doesn't need all that much in the way of "tinkering".

Elixir doesn't just add "sugar" though; it has full-fledged first-order macros, through-and-through UTF8 support, sensical argument placement (the thing being acted upon is always the first argument, making possible the |> operator), sigils, great tooling, and possibly best of all, the best Rubyists are fleeing to it. ;)

Keep in mind that those are online services and can be developed in every languages since it's not hard real time ( aka C++ gameservers ). Most online services for the video game industry are actually made in Java.

> can be developed in every languages

Did you read the slides?

> Problems remain

   – C++ is the wrong language for concurrency
   – Code was becoming impossible to maintain
   – Poor error handling / debugging / metrics / scalability
   – Had to disconnect all users to change configuration
The point isn't that these things are impossible in other languages, it's that Erlang made solving these problems easier than other languages.

Game servers don't do hard real time. That would be overkill, and requires specialized operating systems typically used for niche applications like robotics; e.g. https://en.m.wikipedia.org/wiki/RTLinux

I've been reading up on VR after trying an Oculus Rift. It sounds as if VR game servers will need something a bit closer to "hard" real time than current servers. I suppose it still wouldn't technically be "hard real time" but much "firmer real time". I guess this leaves me wondering if games will ever progress to needing "hard" real time.

That doesn't make much sense, think about lag, playing with somebody who is far enough that the speed of light makes a difference. You want a near real time graphics pipeline for each user to avoid vomiting, but other than that you will still get terrible terrible network problems one way or another which you can't control and will show up as ghosts in the game, stopped players, etc.

Question for those of you who are game programmers. Is the message passing of erlang slower than shared memory threads in c++?

Depends. Erlang's scheduler is clever: if the receiver is asleep waiting on its inbox, sending to it will switch the running process to the receiver and back, all without hitting the scheduler logic. This is a great benefit for "thread" (in Erlang it's a process) pools, which are very very widely used in Erlang programs. It's really the best of both worlds: no Erlang context switch and it's completely transparent to one's codebase.

This isn't possible if the receiving process's inbox is non-empty or it's running. Then the cost should be around the same, assuming your C++ implementation is at the same level of quality as the Erlang VM.

Not a game programmer, but do a lot of multithreaded work. There's about 100000 ways to pass messages in c++, for some c++ is faster (purely bounded by cache communication), some erlang is faster, and for some it's hard to compare (send latency, receive latency, throughput, can all vary greatly).

That only applies for two processes running on a single machine, Erlang can automatically distribute workers (I believe) and in that case it's probably more about networking hardware than language.

Everything I've ever heard on that front is that Erlang should be considered a control layer which has any CPU-bound tasks run in C. Actors/message passing is a good synchronization primitive but there is apparently a whole host of issues with memory leaks regarding it in the "Erlang in Anger" e-book.

I personally lost interest in Erlang because of it only supporting actors as a concurrency mechanism. They're good for some things, and probably exceptionally good at what Erlang was designed for (Telecom systems), but I think they're just one tool in the toolbox. If the recommended solution for CPU-bound tasks is to start writing C, I have to wonder if there's a better way.

Memory leak isn't quite the right word for what I think you're referring to: processes that most forward binaries without generating enough garbage on their own heap did not trigger garbage collection, leading to too much memory being used. This behavior is better in recent releases (otp 19), the space used by binaries is added to the per process accounting so that it can trigger garbage collection.

I agree, if you don't fit into the actor pattern, erlang is probably not a great choice, although the bit syntax is pretty nice, too. Sometimes you can leverage erlang for distribution and supervision and drive the c code; but sometimes it's just not a good fit.

What concurrency patterns have you found yourself unable to express using actors?

Well short answer long term it's the only viable model you are not going to use locking and STW gc on 500+ core system with few TBs of RAM.

Lock based concurrency is perfectly fine on 800 core systems with a pauseless gc and many TiB heaps. See Azul for an example using the JVM.

I second the question of wondering what sort of concurrency is not well-serviced by the actor model? (especially considering you have zero locking to worry about)

I'm not a game programmer, but I've got plenty of experience with parallel programming.

The answer is that either message passing or shared memory can be slower or faster. The architecture of your program probably matters more, though obviously some problems lend themselves more to a shared memory solution and some more to a message passing approach (and many benefit from hybrid solutions).

The main cost factor for message passing approaches is communication overhead. The main cost factors for shared memory are contention and fighting the memory hierarchy (keep in mind that even with shared memory, inter-thread communication will generally have to move data through the expensive parts of the memory hierarchy). Your goal in either case will be to architect your solution to minimize the overhead that you are dealing with.

Also, you're starting to hit physics issues once you want to do shared memory past maybe 16 cores or so. Thermal issues, obviously, but also memory bandwidth and such. And the solutions to these can have an economic impact where choosing a more distributed approach can simply be cheaper. When you're dealing with thousands of concurrent processes (as some Erlang applications do), then a pure shared memory solution is usually right out (though various hybrid approaches can still have advantages over a pure message passing system).

Message passing in Erlang will be a fair amount more expensive - Erlang and C++ have far, far more differences in terms of performance than just message passing vs shared memory.

You're better off comparing C++ and Pony. At that point, a message enqueue is 3 atomic instructions, not too bad. The benefit is trivial concurrency.

A: Yes.

B: Doesn't matter since you're not resource constrained on the backend.

here is the video of the talk: https://vimeo.com/26307654 and hopefully, provides a nice complement to just going over the slides...

Since some people were asking on how to get started with Erlang, some time ago this MOOC was posted on HN for an intro to learning Erlang/functional programming.

What I liked about it is that it assumes you have an imperative programming background so it isn't entirely like learning at a very slow pace (which leads me to lose interest): https://www.futurelearn.com/courses/functional-programming-e...

The next course after this is on the concurrency part of Erlang which is also on FutureLearn that I'm currently taking. It has video clips from Joe Armstrong teaching concurrency (one of the creators of the language).

An important take-away is that's it's very hard to know the bottlenecks beforehand. As an experienced programmer you can have a hunch, but you still need to test and measure.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact