This also reminds me of Erlang VM handling 2M for Whatsapp on a single server in 2012:
Agreed, I'd be quite curious how many libraries in the Go ecosystem they can't use as a result. Either because the library spawns a goroutine, uses one under the hood, etc.
Having to drop to this level makes me wonder if it'd be better to just use a language better suited to this type of asynchronous networking (C/C++/Rust).
HTTP library is unusual in that sense.
At the same time, the beauty of how http library is designed and the solution he describes is that there are hooks that allow being even more efficient with very little code.
Or to put it differently: Go gives you excellent (compared to everything else out there except C++/Rust) networking performance out of the box and you can go even faster with a minimal amount of effort.
What you call "dropping to this level" is 80 lines of code (https://github.com/eranyanay/1m-go-websockets/blob/master/3_...) and now you don't even have to write them yourself.
There's plenty of libraries that involve channels (and assume you have goroutines per connection/client), do those play well with the use of epoll in this manner? I would assume I can't use the stdlib time package to do a timeout for example, since that provides a channel to wait on, while in this setup I need objects that work with epoll. Obviously in that case I could use evio which abstracts over epoll/kqueue and provides a Tick...
So my comment was more that "dropping to this level" means dropping the use of the majority of standard Go concurrency idioms which revolve around goroutines and channels. It's not about the raw lines of code involved. Writing code in this style of async doesn't feel very Go-like and when you look at an example of some code using this style its exactly the thing Go was trying to avoid with goroutines (https://github.com/tidwall/evio/blob/master/examples/http-se...).
You end up with async event-based code, not the clean synchronous-appearing Go code that goroutines and channels provide.
1M simultaneous users of a React (or other) app, is that a decently simple case? What are some sites that have this level of activity? I found a 5yr old article that says Spotify had 20MM simultaneous users then, but spread over 12,000 machines, so suffice it to say I'm having trouble finding a use-case here besides (good!) research.
Let say you publish an Quiz to 1 mill users. Everyone response and you store the responses centrally. Now you send out an aggregated view of the results (e.g. % answered A).
Now, you want to inform one or more users that they won a prize. How do you do this effectively ?
Of better, you want to display the ranking to each user if there are multiple quiz.
When you consider the price of the hardware and complexity of the system, it's obviously useful not just for handling 1M connections per host but also 10k connections per host.
If some other technology / framework can only do 1k connection per host, then you can per 10x less for hardware and have room to spare. And the system will simpler which means faster to develop.
I want to sleep.
The other thing I'd be interested to see Elixir demonstrate would be doing a hot deployment to avoid triggering all 2m connections to try reconnect at the same time.
The other important note with the Phoenix example was that they were starting 3 processes for each socket (to supervisor, handle failures/reconnects) if I remember correctly.
"There was one fundamental mistake made, however, which is that we shouldn't have used channels. ... First, they don't perform well enough. ... Second, they make it very hard to prevent message loss. ... Third, the buffered channels mean that Heka consumes much more RAM than would be otherwise needed"
It sounds like in this case the message-loss-prevention rendered the original design flawed. I don't see any reason why you couldn't use an on-disk queue in Go vs other languages... though the cgo overhead of the lua binding sounds like it was also an issue.
Having them all do something useful at the same time is the hard part. (And no, "async" won't save you here.)
This is just one recent example:
The money quote:
"The end result enabled us to reduce 25 instances (c4 xlarge) running Clojure code - able to process 60 concurrent requests, to two instances (c3.2xlarge) running Go code able to support ~5000 concurrent requests a minute"
If you google around you'll find more stories like that.
To do better than Go you would have to drop to C++ or Rust.
Sure but how much of the performance gains come from Go and how much come from just having a better understanding of the problem the second time around?
Pypy leads to a near 10x speed up over cPython as well.
That doesn't make any sense though.
In any case, as the article text also mentions, on the Go side they are using a complete reverse-proxy library that's included with the Go stdlib, which can be a significant advantage aside from the properties of the language itself.
But then it seems they ended up reimplementing many things that the JVM provides: "In order to achieve out of the box functionality such as CPU and memory usage metrics, business logic counters and more - we needed to write basically this entire stack from scratch, which enabled us a much more rapid deep dive of the intricacies of Golang."
C++ and Rust yes, but also Java and .NET.
Idiomatic clojure is routinely slower than idiomatic Java and that’s a well known and expected outcome.
Nothing special other than having to tune Linux
That is not totally true, This is a mix of 2 things, using the JVM (which like you said is being tuned and optimized for heavy loads) + using a true asynchronous and reactive programming (and IO) model built on great technologies such as (in this specific case: Kotlin, Eclipse Vert.x and Netty).
As an experiment if you would pick another random set of libraries (imagine a servlet container) achieving the same results would not be so trivial, see for example:
And observe that Eclipse Vert.x is on the top for these reasons while other JVM frameworks are far behind.
This means that in order to test 1M simultaneous connections, you would need to use at least 16 client IP addresses. Probably more.