
How And Why We Switched from Erlang to Python (2011) - vector_spaces
https://engineering.mixpanel.com/2011/08/05/how-and-why-we-switched-from-erlang-to-python/
======
lightcatcher
I was an intern at Mixpanel at the time (but not the author of this article or
involved in the discussed rewrite). For context, the company was then <10
people. (I'm surprised to see anything from my first professional software
experience on top of HN!)

My recollection of events (of 8 years ago): The Erlang endpoint was written a
year or two earlier as someone's first Erlang project, in a tiny org with no
Erlang experience. The endpoint worked well enough and people moved onto other
projects in the mostly Python codebase. Eventually, it became painful to debug
or add features to this endpoint because no one was particularly proficient in
Erlang. Ankur (author of the post, intern) then rewrote it in Python.

I wouldn't read this as a negative about Erlang or a positive article about
Python. I'd instead read it as "use a language that makes sense for your
organization and already existing codebase".

~~~
bradleyjg
In general language diversity at a company is a big negative. Obviously
sometimes there are even bigger negatives that mandate overriding that rule--
you aren't going to use Java on the front end or javascript on the backend,
that would be madness--but you never want two languages in the same niche.
That's a cost with no benefit.

If you have some developer that wants to write something in a new (for your
company) language, or even worse goes off and does so without asking, because
he wants to try it out, you probably want to fire that guy. Either he doesn't
understand the downsides or he does, but doesn't care.

~~~
cellularmitosis
Some dev: "Hey, I noticed one of our more complex API endpoints had some
pretty bad response latency, so I dug into it a bit. It turns out our
PHP/Symfony/Doctrine stack was generating over a hundred SQL queries, all of
which were being performed sequentially because we don't have a good per-
request concurrency story in this tech stack. So I wrote an alternate
implementation in Clojure which has a great concurrency story and cut the
latency by 75%."

lol yeah that guy should totally be fired.

~~~
delian66
Another developer: "...our stack was generating over a hundred SQL queries,
all of which were being performed sequentially because we don't have a good
per-request concurrency story in this tech stack. I just bypassed the ORM and
wrote a slightly more complex SQL query, so that there is no more N+1 query
problem in the backend. The latency of this end point was cut by several
orders of magnitude, and our database is not overloaded by parallel
inefficient queries, run by clients with 'great concurrency stories'."

~~~
Izkata
Or, "here's a completely supported not-well-enough-known ORM feature that does
that, that I learned about when it looked like I wasn't doing any work".

------
tombert
In some fairness to the writer, Erlang has improved a _lot_ since 2011.

I'm a big fan of the Erlang platform, and I even kind of begrudgingly love the
language, but Erlang didn't even have first-class immutable maps until 2015,
and while I actually think the semi-prolog pattern-matching syntax is elegant
and beautiful once you get a handle on it, I actually really dislike the whole
comma-semicolon-period thing. (Personally, nowadays when I say I'm writing
"Erlang", I'm actually writing LFE).

Also, the best webserver for Erlang is Cowboy (at least in my opinion), and
the initial commit for that wasn't until 2011, and it wasn't really an awesome
server until around ~2014. I'm not overly familiar with the Python ecosystem,
but if I were to guess, the server frameworks probably _were_ more mature than
a lot of the Erlang ones.

I don't fully agree that it's difficult to benchmark Erlang, since it has a
lot of awesome diagnostic tools available out of the box and moreover it even
has stuff like caching and whatnot built-in with ETS, so I think that argument
comes down to "we didn't really know what we're doing", but, you know, that's
a valid enough reason to not use a platform if your job is to get something
done. I wouldn't recommend most people write their new apps in Idris or Coq
either, for no other reason than it is difficult to fully pick up and use
those platforms easily.

~~~
rrix2
Are you writing LFE in your professional capacity? I've always been curious if
LFE was more than a sort of toy language, it's hard to tell from outside the
community. I've been becoming more and more enticed to learn Elixir (and
Phoenix) and the BEAM, and LFE would be a nice thing to add to the tool belt.

~~~
tombert
Alas, no, I do not get to do Erlang in any kind of professional capacity now,
and haven't since 2015.

That said, I do use LFE for a lot of personal projects. I don't mind it being
niche-ish largely because Virding wrote it to be a very thin wrapper around
vanilla Erlang, so as a result there's not a lot to mess up. With the
exception of macros, there's really not many changes to the semantics from
regular Erlang, meaning that the transformation to and from LFE is trivial.
LFE mostly exists to keep the syntax consistent and cleaner, and while the
_language_ is kind of toy-ish, it almost doesn't matter.

Honestly, I'd be a lot more worried for something like Joxa or Clojerl, not
because they aren't awesome projects (they are), but they don't have as direct
of mappings to the BEAM semantics.

Also, Virding is very active on the #lfe Erlangers Slack, so typically when I
have a question, it is resolved pretty quickly.

------
pdimitar
Them not being able to utilize Erlang well seems to be the gist of the
article. The rest seems like a post-hoc rationalization.

It's fair to switch away from a technology your team doesn't know well and
doesn't want to learn. Maybe that should have been the title.

~~~
vidarh
Long time, Dimi..

It's a favorite pet peeve of mine too that people often try to rationalize
either their lack of experience with a platform or a broken architecture as a
problem with the language in question. Though sometimes it's done because it
helps paper over internal politics (e.g. lack of clout to be perceived to
criticize past decisions) by blaming something external like a language or
framework to justify architectural changes, so it's not always that these
teams don't know what the actual problem is.

(Twitter's transition from Rails back in the day, always springs to mind as my
goto example of this, because as much as the transition might well have been
appropriate, as much as they claimed otherwise, the real problem was never
Rails, but the fact they'd written a monolith instead of building a system
where the message delivery was designed to be sharded; no language or
framework choice could have saved them from a rewrite)

~~~
robertAngst
> they'd written a monolith instead of building a system where the message
> delivery was designed to be sharded; n

Do some languages handle this differently?

Isnt this based on how the code was written rather than a language natively
sharding?

~~~
hn_throwaway_99
Sharding is basically a data storage concern, so it's not something that the
language can really control.

There _are_ however, frameworks that enforce some level of "shardability". The
Datastore (now Firestore) in AppEngine/Firebase comes to mind, in that your
data is shared all over the place (the details are hidden from you), and you
can't write un-indexed queries that don't scale well. The Datastore has some
limitations that definitely seem awkward if you're coming from a relational DB
world (e.g. no joins, limits on the kinds of inequality queries you can do,
etc.), but these limitations are there specifically so Datastore can guarantee
performant distributed queries.

~~~
sb8244
Sharding is not just about data storage. You can shard parallel units of work
across multiple CPU cores in order to prevent single CPU bottlenecks.

Erlang encourages shardable solutions from OTP concepts being so strongly
integrated into Erlang.

------
nemothekid
1\. This article was written in 2011 (it's 8 years old). I was confused why
they were going with eventlet, but the age of the article explains why.

2\. Even for 2011, 1k rps with an average latency of 100ms seems laughably
bad. I have to be missing something here.

~~~
wqwh
When you write Python you already know you're throwing the performance out of
the window. In fact based on my experience those numbers are pretty good.

~~~
avip
It's trivial to get 1kr/s with the simplest of wsgi flask behind nginx without
touching anything. But these "comparisons" are meaningless without asking what
"serving the request" is about.

~~~
y4mi
as much as i love the python ecosystem and django in particular, the parent is
definitely correct.

Don't ever choose python for performance. Its definitely _fast enough_ for
most things, but its just not the tool to leverage if you actually need
performance.

Its wonderful if you just want to get shit done though, there are just so many
good libraries around that you can use to get your project finished without
committing several times the effort it takes in other languages

------
i0exception
Disclaimer: I work at Mixpanel

We have a monorepo and I imported our entire commit history into Mixpanel -
[https://twitter.com/i0exception/status/1010663994435067904](https://twitter.com/i0exception/status/1010663994435067904)

As you can see, over the last 3 years, we've rewritten large parts of our
infrastructure in golang. While we still use python for a lot of things, we
felt that the type safety and concurrency primitives in go were a much better
fit for writing some of our core services.

~~~
hu3
250k lines of Go in ~3 years, around 1k per weekday is impressive.

Adding 1k lines of code per day to a monorepo while keeping the project
manageable is no easy feat.

Is it split into microservices? Any tips to tame the beast?

~~~
i0exception
We try to avoid microservices wherever possible. If we're adding something
new, it typically starts off as part of the service being deployed - either as
a container within the pod (we use kubernetes) or as a library that the code
can use. If something grows big enough in a way that it can't scale with the
service it's running with, we split it into a separate service. The opposite
is also true - if a service that we run no longer warrants a separate
deployment, we make it a container or a library. We use GRPC for most
communication and interfaces for anything that travels package boundaries.
Both of these help with making the split/aggregation a lot easier to manage.

------
59nadir
If you're able to make a faster service handling many concurrent requests in
Python, you indeed should've never used Erlang from the beginning, because you
clearly didn't know at all what you were doing. This article is spot on in
that regard.

------
giancarlostoro
Sometimes using the right tool for the right job also means making sure you
have people who can use said tool.

~~~
dragonwriter
Or, part of the job that the tool has to be right for is the team, which can
be as much a part of the job definition as the output.

OTOH, if there is strong reason to think the tool is otherwise correct,
finding resources to. enable the team to gain and/or borrow the knowledge they
are currently lacking should be practical.

~~~
giancarlostoro
> OTOH, if there is strong reason to think the tool is otherwise correct,
> finding resources to. enable the team to gain and/or borrow the knowledge
> they are currently lacking should be practical.

It's incredible companies don't invest in training enough, and this goes both
for the companies who provide training (causing companies to avoid wasting
money on it to begin with) and the ones who need training for their teams.

A senior I worked with mentioned how he learned OO when his company he worked
at had everybody trained on it, and hasn't forgotten it since.

But now it's all the job of the degrees right. They'll totally cover
everything your company needs to use every obscure tool you want to be used
for pennies on the dollar.

~~~
guitarbill
A simpler explanation may be nobody wanted to learn Erlang, not that the
company didn't want to provide training. I'm still not sure myself what the
benefits of learning Erlang are with regards to my salary. I'm sure there are
highly paid Erlang jobs out there, but so far, my career has never been
defined by a single language.

~~~
giancarlostoro
For this particular company yes, I did say companies though speaking of the
industry as a whole. It seems on the job training is no longer getting the
type of investment it once did.

------
tdumitrescu
FWIW the server in question is really simple. It has since been ported to
Golang, which has become one of Mixpanel's primary languages (alongside Python
and JS/TS).

~~~
arendtio
Now this will make the Erlang people happy :D

Sometimes I have the impression that functional programmers dislike Go
especially for having such an inelegant language design ;-)

Personally, I don't have hard feelings for either side: I love Go and like the
functional style in general (I know a bit of Scheme). Erlang is one of my top
2 languages I want to learn (together with Rust).

~~~
StreamBright
To learn Erlang is not that difficult. I think it is a bit harder to
understand how to write production ready sacable OTP code. Btw. you should
check out Elixir. It is much easier to get into Elixir nowadays and many
aspect (like deployments) are solved while with Erlang you need to do a bit
more to get the same results. You can also just toy with Erlang without OTP
(not really recommended). :)

------
choiway
Short version:

How? We had a intern build it.

Why? Intern didn't know erlang.

~~~
yaleman
> No one on our team is an Erlang expert

... and neither did we.

------
phoe-krk
> No one on our team is an Erlang expert, and we have had trouble debugging
> downtime and performance problems. So, we decided to rewrite it in Python,
> the de-facto language at Mixpanel.

So they have changed the language to suit the programmers instead of changing
programmers to suit the language.

~~~
robertAngst
>Pick a language that has limited programmers, bring programmers from around
the country, pay them well beyond 6 figures so they don't leave and have a
crisis every time a programmer leaves.

OR

>Go with a mainstream language that they teach every CS kid and has every
googable question imaginable in SO, hire 1 experienced programmer to manage a
bunch of post-college-kids.

There are lots of good reasons to pick a language with a strong community.
Once in a while, I hear people recommend obscure languages(or up-and-coming),
and I think- that is going to be expensive to maintain.

~~~
kvakvs
You do not need a large team of Erlang developers though, a much smaller team
can hold the project afloat, even hire consultants to fix it and leave it for
a few years. Also Erlang dev salaries aren't that stellar, just average what
you'd pay a Python or a Java person.

~~~
shRaj9fEc8Vith
it's almost impossible to find Erland dev in my country :D

of course we can have someone learn and then teach to the rest of the team but
why bother doing that when you can select another language where it's a lot
easier to hire

------
YeGoblynQueenne
>> I’ve learned a lot about how to scale a real service in the couple of weeks
I’ve been here.

I take this to mean that the new server implementation in Python was written
in at most two weeks by a developer who was an intern at the time. I don't
think I'd be easily convinced that best results could be obtained this way.

The original Erlang implementation could be flawed and buggy, but wouldn't a
much better result be achieved by fixing its problems, rather than rewriting
it from scratch to another language, that is not known for its efficiency or
for being very good for networking, unlike Erlang?

In any case- is there ever a case where an intern can write the best possible
system in two weeks? And why would you hand off a critical part of your system
to the most junior member of the team? I mean, unless your team are all elite
hackers from the higher echelons of computer science masters and the "most
junior member" is one of the best programmers in the world, or so?

------
drudv
In our team we have two enthusiasts who propagate ReasonML for front-end
development. Personally I like many its features (for instance, pattern-
matching) that JS/TS is missing. There are a lot of talks about it in React
community. We even tried it in some small projects and seems like we could
start to use it wider. But situations like this described in the article make
us concerned. We are following developments in ReasonML community, but for the
moment we don't dare to start adopting it.

------
bsaul
The term « erlang expert » made me wonder : since erlang is supposed to be a
much saner and stricter architecture ( share nothing, code inside the actor in
a synchronous manner, etc), what are the usual troubles someone encounters
performance-wise in that language ?

Apart from a particular cpu intensive task that would make just a single
request perform badly, it seems ( from just reading about the language) that
just following the guidelines ensures top performance.

------
njharman
Some things are timeless. 8 year old articles on tech stacks maybe not so
much.

------
dhab
Elixir, which runs on the Beam VM, had piqued my interest for a while, so
finally set out to learn about it this weekend. On searching for "flaws" of
the language and the platform, I (accidentally) ran into:
[https://www.youtube.com/watch?v=42k70Y-yTYY](https://www.youtube.com/watch?v=42k70Y-yTYY)
. (year 2017)

Video description:

    
    
      ...What many developers don’t understand is that Erlang is 
      built on an architecture and within ecosystem 
      that contains many subtle security flaws. 
      One such set of flaws allows anyone with the ability to 
      interact with a remote Erlang node to compromise 
      that node by abusing the underlying BEAM Virtual Machine 
      and the services required to run Erlang...
    

My notes:

* Looks like a deep issue in VM architecture

* It's not detectable at all

* Ericsson was informed 1 year prior to the talk, and their recommendation is to not expose nodes publicly

 __Speakers argument: Yes, but the threat still exists for another internal
project to exploit it

* Speaker's belief: It doesn't look like it is going to be fixed (any time soon)

To me to me this sounds like a very serious issue - to the point that I have
crossed off anything on Beam that - I wouldn't build/learn to build on,
wouldn't trust (note: didn't say wouldn't use) another software that was built
on it.

Overall, it seemed like a great platform that scaled upto a certain point with
good guarantees on latency and throughput and resource utlitization. Not to
mention, seems like (probably) only one platform that does pre-emptive
scheduling. Heartbroken that after being marketed as "battle tested", this
aspect of the VM/lang has gone under radar for so long.

~~~
bdibs
This is an issue that’s overblown, simply have a firewall that exposes used
ports and you’re fine.

There’s also an option within the release tool distillery to limit it to the
local network.

Not to mention even if an attacker found your unsecured setup, they’d also
need to know your “cookie”/key to do anything. It’s no different than leaving
SSH login with password enabled on any server.

~~~
dhab
The speaker claims that the cookie/key can be known by causing VM crash, which
is done by exploiting the monotonic nature of cookie values which are stored
as atoms which has a fixed cap on how many they can be, which when exceeded
causes VM to crash. He estimates the time to do that be 10 mins (I forgot the
exact memory limit)

~~~
bdibs
That’s definitely an issue, but again can be easily mitigated with a simple
firewall that doesn’t allow every port to be open to the world.

------
rlander
Previous HN discussion:
[https://news.ycombinator.com/item?id=2852415](https://news.ycombinator.com/item?id=2852415)

~~~
halbuk
The top comment is especially relevant: _" This is more about the technical
competency of a specific company than general technical issues. Or, to put it
bluntly, it's more "Mixpanel sucks at Erlang" than "Erlang sucks". Don't get
me wrong, I'd be really, really interested in a good analysis why in this case
Erlang was the wrong choice, but this article didn't even get close to
anything technically interesting"_

------
StreamBright
My problem is with such articles that very rarely I see some in-depth argument
with numbers and details to justify the switch. Many engineers are just too
lazy to drill down and get to know their stack so that they can scale it up
when necessary or add/remove fetaure, fix bugs with reliable veolicty. I have
spent some time in California working for startups and the ratio of engineers
who know their stack in depth to ones who would switch language because of
trends, never really get to know anything in depth is 1:100.

------
JimDabell
Needs a [2011].

~~~
vector_spaces
My bad, fixed. Thanks

------
sk5t
Gee whiz, you mean it's a good idea to choose a language/framework one knows
well for implementing production services? Or maybe bring in an Erlang expert
as a consultant to fix things?

