Hacker News new | past | comments | ask | show | jobs | submit login

> Why WhatsApp Only Needs 50 Engineers for Its 900M Users

Answer: because sending short messages from A to B is basically a solved problem. There is even a programming language (Erlang) that was made with this application in mind. The prototypical "Hello World" example for Erlang is a messaging application.

I'm sure you could do it in an afternoon, right? Just like anyone on HN can make a "basic CRUD app" like Facebook in a weekend.

I really dislike the arrogant attitude behind these types of comments. Have you worked at WhatsApp? Do you have any idea what they spend their days doing, what their systems look like, what their requirements are like?

WhatsApp have close to a billion users, 50 engineers and they manage to produce a near flawless experience. This is not trivial, no matter how many armchair critics claim otherwise. It used to be the case that people dismissed FB as just another CRUD app ("not even written in RoR, but in PHP, yuck" ~ 2005-2010), but given the tools they have been pushing the last few years people seem to have stopped dismissing them as casually. Perhaps in a few years people will stop dismissing WhatsApp as casually as well.

I believe that the line of reasoning of your parent still has merit. Erlang OTP was developed for telephony systems and has proven to be very reliable for distributed system. Messaging maps relatively nicely to Erlang OTP and as such Whatsapp are leveraging decades of research and work by Ericsson. Of course, this is how it should be: a good software infrastructure should save users time.

Obviously, creating Whatsapp is nothing near a weekend project. But I think there founders should be commended especially for avoiding some classic mistakes: the NIH syndrome and going on a hiring spree once the investments started rolling in. Their managerial skills are at least as good as their engineering skills.

I think that indeed the important thing here. The article try to focus a bit too much on some intrinsic qualities of the language as if it was the silver bullet of scalability while the real lesson is that they used the right tool for the job.

Not so long ago, maintaining hardware that could support 900M user would have required an IBM scale company. A single developer today can develop, deploy and maintain a medium scale ecommerce site using infrastructure like AWS, a myriad of OpenSource solutions, plenty of services to handle payments, ... yet he will probably be using the same language than 15 years ago when 20 times more developers were needed to achieve the same result.

From the article:

* SoftLayer is their cloud provider, bare metal machines, fairly isolated within the network, dual datacenter configuration

Softlayer is an IBM Company, so ultimately it DOES require an IBM scale company to run this ( at some point in the chain).

Indeed. It's amazing :) . All the infrastructure-as-a-service stuff REALLY cuts down the overhead.

I'm building a product that only a few years ago would have required a VERY significant investment, but now is basically stacking and connecting lego blocks of services :) - sales-as-a-service isn't mature yet, but if we had it, it would even sell itself :) .

Ok, it's probably a pain if one of the black boxes fails, and I do know enough about the underlying architecture behind the services I'm using - I could probably replicate (badly) most if not all of them given time and money, but why would I?

> Ok, it's probably a pain if one of the black boxes fails, and I do know enough about the underlying architecture behind the services I'm using - I could probably replicate (badly) most if not all of them given time and money, but why would I?

I don't think this is a possibility whose importance can be understated. In most of my apps running on IaaS/PaaS offerings like heroku, key pieces of the infrastructure tend to fail for short times (typically very inconvenient times) fairly frequently, on the order of once or twice a week. The relative cost of this is highly variable and business-dependent, but at best its mildly inconvenient and at worst can be crippling. Its definitely something that needs to be accounted for. That said, I've also been lead on apps that used the same pieces of server infrastructure that had far better uptime when I handle ops myself instead of delegating that to another provider. (Not to pick on the providers—they have a very difficult job to do in a cost-effective way, and I'm quite certain that they do it better at that scale than I would.)

The choice of Erlang for WhatsApp was strategic if for no other reason that that it allows them to dispense with some of those choices. Many of the typical use-cases of Redis, for example, are easily supported by OTP.

I'm not sure how you managed to translate "Its not surprising 50 engineers are enough" to "I could do it in an afternoon"

Essentially, yes? I mean, are you suggesting these engineers are doing magic? That they wrote some new and amazing code that anyone could not write? That these 50 people are the only ones who could do this and somehow ended up at WhatsApp? Or are they just leveraging powerful systems efficiently?

Have you worked at WhatsApp? Do you have any idea what they spend their days doing, what their systems look like, what their requirements are like?

Do you? If yes, please enlighten us. Otherwise I don't see the point of your objection. No one said that it can be done in a weekend. All that the parent comment said is that the fundamentals of building WhatsApp are already solved. He/she didn't dismiss it. Just because something isn't rocket science doesn't mean it's not useful.

As for Facebook, people used to dismiss it not because it was seemingly too easy to build but because they failed to see its usage. Long before it became a platform it was pretty much a waste of time. Obviously anything with more than a hundred MM users is difficult to maintain.

Not "parent" but search for Erlang Factory talks they talk about it. Here is one from Rick Reed. He explains how the product works :


I can personally guarantee to you from direct current experience that there is nothing particularly solved about running services for even 50 million users.

Go watch this: https://www.youtube.com/watch?v=TneLO5TdW_M#

Hang on, I _could_ write a (small) twitter or (early) fb clone in a weekend. Easily. That would probably scale to 10k tweets or feed entries.

What I in _no_ way could do is rewrite fb or twitter at the scale these systems operate today with a billion users and even more posts and data points. I don't agree with the sentiment of the previous poster, but fb & twitter were initially basic CRUD apps.

I have to disagree. They were messaging brokers masquerading as or mistaken for CRUD apps, which directly led to many of their early scaling issues.

But you can just implement them(the early versions) on any old server with a nice database installed if you want to. And you can do that in a weekend.

And yes of course you will want to take a step back and change almost everything about it if you want it to scale to more orders of magnitude.

This might be me being exceedingly charitable to the OP, but I see his point differently:

There's a ton of work involved in making a platform like WhatsApp a polished user experience. There's then a ton more work involved in making it _solid_. In choosing Erlang, they chose a platform that trivialises a large part (though not all) of the latter, allowing them to focus on the former.

I think the reality, as is often the case with two extremes, is somewhere in between.

Is this a simple 'solved' problem that anyone could do? No. But is it a super-human feet that only a handful of people could achieve? IMO, no as well. I think people in the latter extreme like to believe this as it's something they can aspire to - to say otherwise is to devalue their vocation. In the former camp, I think it's people pushing back against this too far in the opposite direction.

>I'm sure you could do it in an afternoon, right? Just like anyone on HN can make a "basic CRUD app" like Facebook in a weekend.

Maybe the backend. I couldn't write a Blackberry Java app though which was WhatsApp's initial value proposition.

Well, frankly, there are plenty of open source chat server implementations (usually following XMPP standards) which easily scale to 10's of thousands, sometimes 100's of thousands of users without any modifications (OpenFire, to name one). It wouldn't be that difficult for a group/team of people to scale that into a cluster of servers and support several hundred million users at once -- XMPP packets are small and this problem has been solved many times.

From a technical standpoint, WhatsApp, the product, and the scale they operate at are nothing new.

Interesting! When did a XMPP server run with hundreds of thousands of users? At gmail?

Google Messenger, Facebook Messenger, MSN Messenger, AOL Messenger (AIM), etc...

(note that both google and facebook use proprietary chat protocols now i believe)

I second you. People throw in things like "Duh, it's a lame technology, duh, I am working on something far superior, duh, that 3 people will ever use, duh"

Don't forget WhatsApp has applications on all platforms.

Well said.

Sure, the initial business case is "Send a message from A to B". But it's not hard to think of more things to do:

- What if A has a Nokia and B has an iPhone?

- What if you want A, B, and C to have a chatroom?

- What if someone starts spamming random people?

- How do we QA the thing for all these platforms?

- How about someone to keep the visual design fresh?

- What about a million corner cases like when someone is not online, someone sends a message that's too big, one of the servers goes down, and so on?


You're right that the scope is not as big as some other apps, and that's what makes 50 a sensible number rather than 5000.

Pretty much every single one of your test cases has already been solved in the XMPP specification, which WhatsApp is likely using (essentially all modern chat services/servers/clients speak XMPP).

Sure. And someone's got to take the XMPP package and put it on the different platforms, test them all for correctness, etc. I'm not saying there aren't existing parts that can be used. But even if you have a plausible component, someone's got to look at the docs, maybe build a toy version, think about whether you're hitting the limits, integrate with the rest of the codebase, and so on.

It's easy to see what pieces you need. But there's still work in glueing them together and testing the result.

I think 50 is a sensible number of in-house engineers for this sort of thing. You want cover as well in case someone leaves or gets ill. You want people to be able to upgrade the system when things change. You want people to look ahead at future requirements. And you want to be able to do these things all at once, with slack built in for peak load.

Well, you sort of are making my point -- 50 engineers is plenty (maybe even too many) to cobble together a solution to a problem that already has been solved and solved again.

WhatsApp did nothing technically challenging, but they did strap a business model (err, sort of) to something many others failed to capitalize on.

Yep. And of course, Erlang is really only a functional language in the small, the parts that don't matter so much. In the large, arguably the scale that matters, it's an actor language, with highly encapsulated entities exchanging messages. So essentially what Alan Kay had in mind when he coined the term "object oriented".


Indeed, Joe Armstrong once said that he believes Erlang to be the only OO language according to Alan Kay's idea [1].

Also, let's not forget to give credit to Carl Hewitt who came up with actors in the first place [2].

[1] http://www.infoq.com/interviews/johnson-armstrong-oop

[2] https://www.youtube.com/watch?v=7erJ1DV_Tlo

Upvoted for correctly invoking Alan Kay's vision of object oriented. :)

This comment focuses on the back-end. As important as it is, as far as I remember when FB acquired WhatsApp then one of the main advantages was the presence of clients in almost any mobile device (thanks to J2ME). I am pretty sure it is not trivial to maintain so many clients, even though over time the older clients will disappear and it will be mostly Android / IPhone / Windows (??) that the engineers need to maintain.

The only thing simpler would be sending messages to a centralized queue for people to fetch. Yet Twitter needs 4100 employees for that (though not all engineers I hope).

Centralized queue and global fan-out makes it only harder to do at scale, not easier. With 1-to-1 messaging, it's trivial to shard the whole system.


Twitter is not 1-to-1 messaging, generally.

Loads of recognition for erlang, but not a single word about ejabberd in the article...

Talking about Erlang has the cred of an esoteric language. And you can hype up your company for using it, the allure of different languages seems to be in fashion right now (the article goes on tangents about Facebook/Haskell, Mozilla/Rust and Google/Go).

Talking about ejabberd is admitting a core piece of the product is standing on the solders of giants. Something startups are often loathe to do.

I bet that WhatsUp infrastructure would work ~ 35 engineers.

That's a social problem though (the fact that 35 guys generate so much cash, while on the other side of the planet 350 millions are looking for a way to pay the rent), not a technical one :-)

Actually, over lossy links, it's well known to be an impossible problem:


I don't see how that's an issue here. The recipient doesn't need to know if his message receipt has been received. The burden remains with the sender to retry, just like TCP. The real Two Generals' Problem in instant messaging can just be passed to the communicating humans to deal with.

You are right. But on that level, even breathing is an impossible problem, because the air molecules could statistically bounce out of the room.

A whatsapp conversation is not dependent on the parties coming to a consensus on the timing for a common event.

Also, an army can only send so many messengers before you run out of soldiers (or operational readiness). Packets are free and plentiful. If you're on a network link so bad that you can't even occasionally receive a TCP ack, then you're very much an outlier.

By that logic CouchDB would have been a big success and wouldn't have suffered from scalability problems

Also make sure to support offline devices being sent text, audio, video and image files (and the client application on several platforms)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact