I'm primarily a Python programmer, and while a great gen-purpose language, Erlang has been what I've found most appealing as best-in-class serverside. From the reading and research I've done, it seems like an excellent next-step.
On the implementation side, like Python it's weaker with CPU heavy tasks, but a normal program maintains low latency, unbeatable uptime, hot code swaps. For CPU heavy work, I presume most Erlangers do what we all do- C (or maybe Rust) libraries.
On the language side, it's terse like Python and a small language.
I fail to see why this shouldn't be the server language of choice. Other than tougher recruiting. Learning about FP would be interesting too.
I've dabbled in Go and Node.js, and keep up on developments with Rust, but keep coming back to Erlang as what I'd like to deep dive on next.
I'm at an Erlang company; it's hard to find people who already know Erlang, most of our server hires (including me) are smart people who are willing to learn a new language. Coming from a mostly PHP/Perl with a bit of C background, Erlang is weird, but it's a great fit for our use case. I haven't played with any other FP languages, but it seems like Erlang is more pragmatic than most, it's easy to have side effects, and the OTP libraries include a lot of useful stuff.
Anyway, we have enough non-core stuff (the mostly static website, resource translations etc) floating around that's not in Erlang that having people around who used PHP or Python in a former life is a good thing. :)
Where do you work? If you don't mind me asking. I'm always curious as to what companies use Erlang. My interest in he language came from Elixir, now I can't stop playing around with Erlang/Elixir.
We still use it for our new product (marketing tool for the music industry) but while it certainly works for us, we no longer have to support tons of simultaneous connections to our servers. The backend tech could almost be anything.
Ah neat, I used Soundrop when it was around. You guys should do a write up on the architecture now that it's closed down. Any material on in production BEAM apps are gems for me.
Thanks for having used Soundrop! :) A write up of our architecture (and its many rewrites) is in our backlog but don't hold your horses, right now you are focused on building our new product.
Having used a language with an advanced type system (Scala), I couldn't go back. It's a real sweet spot of power; in Python async is either painfully explicit (callback-based APIs like Twisted) or (locally) completely invisible (stack-slicing approaches). In Erlang actors are a language-level construct and the language makes a grammar-level distinction between sending a message and calling a method. In Scala the distinction between a sync and an async method is visible, but concise, and Future is just an ordinary type with ordinary methods; there's an actor implementation if you need it but it's just an ordinary library. You can implement your own "behavioural contexts", and use existing library functions to manage them.
Certainly take the time to learn Erlang, but when you get the chance it's well worth checking out a modern typed functional language - F# or Scala or Haskell.
Take a look at Elixir on your dive too; it compiles to run on the Erlang VM, follows the OTP model and you can use native Erlang libraries directly in your code.
Overall it offers a higher level entrance to the Erlang world and coming from Python you may find the syntax slightly easier to digest (anecdotal).
You can certainly port slow Erlang code to C, but it's not as straightforward as doing so in, say, Python or Ruby. For one, if your C code crashes, it's not the calling Erlang process that crashes, but the entire VM [1]. Second, NIF implementations are supposed to cooperate with the BEAM scheduler and only run for 1ms at a time. jlouis points out that most NIFs fail at this [2]. Third (and maybe the expectation here is too great), binary data (structs, etc.) used/provided by NIFs are "magic binaries" which cannot be pattern matched and are represented in crashes, the REPL, etc. as "<<>>" which can be confusing [3].
> For one, if your C code crashes, it's not the calling Erlang process that crashes, but the entire VM [1].
NIF's are one of several ways of linking C and Erlang, and the one most similar to how Tcl/Python/Ruby/etc... link in C code. In those runtimes, if your C code segfaults, you will also crash the interpreter. You're correct that they're a bit trickier to integrate because you have to consider the scheduler, whereas Tcl/Python/Ruby don't care if you call out into the C code and it takes a minute to complete.
Erlang offers you other means of working with C code, like ports, and C nodes, which can comunicate at a distance, and run in separate Unix processes. This is a compromise in that comunication is more expensive, but the system as a whole is more robust.
> I fail to see why this shouldn't be the server language of choice. Other than tougher recruiting.
Actually, I hold a position that in some sense it makes recruiting easier.
It's very hard to find idiots that can be taught to write in Erlang (or
Haskell, OCaml, or Clojure), so SNR should be very good.
You don't really know if someone can be taught the language or not until after they're hired, though, so it doesn't really help with the SNR in interviews.
Checking if the candidate can learn the language helps if the hiring process
involves a small assignment. It can be much smaller than typically (because
learning new language is difficult enough).
There is also a case when candidate already knows a functional language, just
not Erlang (or whatever is required). Then one might assume the candidate can
learn one more functional language.
I work in an Erlang shop, and I see candidates that are really interested in learning a functional language. These candidates tend to be of a higher caliber. It's not that we suggest learning Erlang in other words... the candidates do, and that's a good signal.
For CPU intensive operations, the JVM is much better than Erlang or Python and with the new breed of languages I feel more productive with it than I've ever felt when using Python. Speaking of RTB platforms, the JVM is the other high-level server-side platform that can handle the high loads.
Erlang sounds nice, but I don't like platforms that are too specialized for certain use-cases. I also never understood the "C libraries" argument - this is what made Python non-portable, this is what will kill it.
I fell into doing RTB (buyer side) with the JVM (Java/Scala) many moons ago, and this was a mistake. The GC isn't cooperative enough to meet the expected latencies in the 95th percentile of requests and above. There were erratic pauses, even after extensive GC tuning. I wouldn't do it again.
After coming to Google and working on the other side of the RTB process for a while.. well, I just wouldn't do it in anything other than something like C++ or Rust -- a systems level language where I can fine tune everything and not be interrupted by a GC. I just wouldn't mess around -- not only is it important to have low latencies, it's important to have _consistently_ low latencies.
I think that using a manual memory language is fine if that is the engineering decision you want to make.
I'd point out that people can and do use GC'd languages in environments that are dramatically more latency sensitive than RTB. For that matter, in latency sensitive applications allocation/deallocation after startup is a no-no, so GC tends to not be the major reason not to use the JVM (memory layout/unfettered access to system calls/etc tend to drive that decision).
From my experience, Java doesn't really offer a true realtime friendly non-blocking collector which can offer consistent latencies. There are a million flags you can do set the GC in Java, but nothing is going to stop GC pauses from happening entirely.
95% of the time the latency is acceptable. The problem is that under heavy throughput the latency spikes periodically. I wish I still had some of my old graphs to show.
I just wouldn't do it again, the time spent futzing trying to tune the JVM to avoid this would have been better spent on writing code in a systems level language where allocation is predictable.
I was a big booster of the JVM as a mature platform for a lot of things. But for this purpose... nope.
My point is that most truly latency sensitive applications, in GC'd languages or not, don't do any allocation/deallocation on the hot path. Memory management is just too slow.
So GC twiddling is a red herring in those environments, though using manual memory techniques on the JVM may obviate the major reasons to use it in your particular cases. Lots of teams make the engineering decision the other way and use the JVM in usages that require more consistent latencies than RTB requires.
Well, per the OP, Erlang is pretty nice there as you don't have the GC pauses that you see in most JVMs (Azul being the possible outlier). You get all the benefits of automatic GC, without most of the negatives; GC is generally quick, and it is stop the process (as your others are run on other cores), not stop the world.
Yes I'd definitely consider Erlang for this type of application. Though at the time that I was implementing an RTB bidder (2010/2011 timeframe) Erlang still did not offer actual multicore support, was still quite considered quite exotic (and probably still is) and therefore made pointy-hair types nervous, and my negative experiences with RabbitMQ at the time (very unstable, and not actually that performant) made me leery of using it in production code.
After working on the bidder side of RTB, I then had two jobs where I worked on the other side, _sending_ bid requests, one of which was here at Google. I learned a lot.
Erlang supported multicore well before then. It was enabled by default in R12B and later, which was released in 2008. The other considerations are valid though (well, Rabbit maybe not so much, though as a popular app it's easy to judge an esoteric language by a single app when it's so rare).
The biggest weakness of Akka (Erlang for the JVM) is the GC. If you aren't really careful about memory usage it's easy to trigger big GC pauses that render your node unreachable by the rest of the cluster.
CPU-intensive operations pose their own problems. EC2 doesn't have amazing context switching so sometimes Akka's internal dispatchers won't be able to get CPU time quickly enough to keep the cluster heartbeats going, also causing your node to be marked unreachable by the rest of the cluster. I imagine bare-metal would be far easier to work with in this regard.
There are certainly ways to deal with these problems by tuning the GC and Akka failure detectors, but it's a serious problem with Akka.
Erlang actually does have compilation to native code as well, so I guess other than for native third party libraries or numeric stuff which Erlang isn't well suited for, it requires less C help than Python.
FYI Current peaks for RTB are about 2.5m qps world-wide today. Some exchanges are seeing as many as 100B impressions per day with mobile and display impressions combined.
RTB is an amazing challenge and way too much fun for those inclined to bang their heads against all sorts of problems you never run across in the "regular" world.
I've done some work on these systems, one of my favorite challenges is writing tools that segment impressions in real time, you have only 10ms to respond because you're augmenting the bid before others can bid. A RTB bidder typically has about 100ms to respond. When you're shooting for 10ms at 2.5m qps peaks you have to think about everything from kernel versions, network settings, pinning to processors, avoiding GC, logging to disk is too slow etc.
As someone close to the RTB world observing from the outside, what amazes me is not the throughput or latency numbers but that it all happens on technology stacks that seem inappropriate for the problem. Everything from broadcast messages over TCP/IP or request/reply semantics for messages with no useful replies, to having to scale connections even though the number of participants is fixed. Its a whole world built on http that doesn't seem like it should be. That you can scale qps in that world while maintaining a semblance of response time is really something.
As a complete tangent, as someone who has worked in environments that had at least an order of magnitude tighter latency requirements than the RTB world, and who sees this misconception a lot, logging to disk on a modern OS is not "too slow". There may be very valid reasons not to log to disk, most obviously operational concerns, but latency/throughput aren't one of them.
Were these guys a very small shop? Any info on which exchanges they bidded on? The RTB company I worked at was considered "small" and we were handling around 300,000 requests/second with 15ms max time per request spent in our systems between just Google and Facebook's exchanges.
Maxmind - good. Not like there's many choices here, though. :)
Instead of "waking up to sync logs", consider using something like NSQ to emit events as they happen. You can scale the number of servers/processes generating messages and the number of workers consuming those messages (and committing them to your database) very easily.
You could also replace the writing of the transaction log with a NSQ event. It lets you avoid having to write and scale the log shipping stuff.
We precalculated which ads a given user was eligible for and a separate process was contacted when a bid request came in to get the info for the ad to show. We never had to do anything funky to do geotargeting exclusion at scale.
Instead of having your adserver connect to a database, have a separate process generate a working set (as JSON or whatever you fancy), compress it, and ship it to the adserver periodically. The adserver can just do a straight load from the file every minute or whatever interval you'd like. If the file's mtime is too old, raise an alert and stop serving ads if necessary. Keeping things separate and simple lets you scale more simply. Our working sets were on average about 2gb uncompressed and they could be loaded in a few seconds (C++/JSON and later Go + JSON).
Seems like it was a fun project and I hope you learned a lot!
Anycast will spread incoming requests out to N physical machines, then you can do your layer 3 load balancing, then you can do your SSL termination and HTTP load balancing.
We didn't use anycast, though. I suggested it to our CTO many times and it would have saved us over $5,000/mo in DNS costs, but it never got done.
Even if a single box would be capable to handle that many requests, you should never have just one box to be the main entry point to your server (failsafe).
I also suspect he refers to 300k distributed across multiple datacenters.
On the implementation side, like Python it's weaker with CPU heavy tasks, but a normal program maintains low latency, unbeatable uptime, hot code swaps. For CPU heavy work, I presume most Erlangers do what we all do- C (or maybe Rust) libraries. On the language side, it's terse like Python and a small language.
I fail to see why this shouldn't be the server language of choice. Other than tougher recruiting. Learning about FP would be interesting too.
I've dabbled in Go and Node.js, and keep up on developments with Rust, but keep coming back to Erlang as what I'd like to deep dive on next.