Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why do you think Signal's service fell over with this week's user surge?
5 points by aloukissas on Jan 16, 2021 | hide | past | favorite | 2 comments
As of now, status.signal.org still shows that Signal is still struggling to handle the agreeable huge influx of new users. However, these are smart guys behind the project (perhaps their strength is privacy, not system scalability? doubt it though). The question is: why did this happen? Especially since WhatsApp's famous super-scalability with just a handful of servers.

It looks from their GitHub repo that the signal server is a Java project [1]. WhatsApp's "secret" weapon for getting such great scalability for so cheap was choosing to use Erlang. I would like to believe that this technology choice must have played a role here (not that people haven't build massively scalable apps with Java/JVM).

Now, why the slow response to increase capacity? Are they perhaps not running on a cloud infrastructure, where adding more servers to the problems is as easy as a few clicks (or zero, e.g. with k8s autoscaling and other such tricks)? Perhaps, again for privacy/control, they're running their own metal at a colo or somewhere.

Finally, could it be their clustering system works in a way that prevents them from just "throw more servers to the problem" and is rather more intricate than this?

I think all this may be a fun "postmortem in the dark" experiment :)

[1] https://github.com/signalapp/Signal-Server



>WhatsApp's "secret" weapon for getting such great scalability for so cheap was choosing to use Erlang.

WhatsApp's "secret" weapon for getting such great scalability is the decade the founders spent at Yahoo building scalable systems, if I'm not mistaken. Give Erlang to someone else and they'll build a system that'll choke. The language is not the only part of the system.

There was a sudden increase in the number of users, amplified by many Twitter account holders with huge audiences tweeting about moving to Signal. The system was designed with tradeoffs and assumptions. These assumptions changed and the system is unstable. They're likely trying to reach a steady state with a new set of assumptions and tradeoffs.


Yup! And this is why I mentioned that the Signal team's strength may be more on the cryptography/privacy side rather than distributed systems. However, Erlang/OTP and the BEAM do get some great scalability features straight out of the box that can help you "cheat" further until you really have to dig deep into distributed systems scalability stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: