
Show HN: 2M fully loaded concurrent WebSockets - lganzzzo
https://oatpp.io/benchmark/websocket/2-million/
======
PaulHoule
Reminds me of the good old days when an IBM 360 running CICS could support
13,000+ terminals with just 16 MB of RAM.

I thought shoehorning was a microcomputer thing until I found out about that!

~~~
drudru11
Is there a URL that talks more about how CICS pulled that off?

~~~
tyingq
3270 terminals are block mode, so there's less state for the server side to
deal with as compared to a typical Unix terminal.

I'm sure there's more clever stuff, but that does help a lot.

~~~
PaulHoule
It's actually not that different from a traditional (pre-AJAX) HTML forms web
app.

That is the mainframe would send out a burst of data to draw a screen on the
terminal, the user would fill out the fields, then the user would hit a button
that sent the field content back to the mainframe.

Thus the mainframe did not so have to handle an interrupt for each character
but instead just one for the whole form.

~~~
drudru11
Ah. Thanks guys.

~~~
PaulHoule
I'll fill in some more things.

They didn't have the bloat back then that we have these days, like the crazy
deep call stacks you see in both object-oriented and functional programming.

CICS programs were frequently written in assembler, sometimes COBOL and PL/I.
Mainframe compilers from the early 1970s were much more advanced in many ways
than the Unix/C technology described in the Dragon Book.

The mainframe did I/O through channel processors which were quite expensive in
themselves, but offloaded a lot of work.

CICS did many things that operating systems do in user space. IBM never did
produce a "ring to rule them all" operating system for the 360 series, but
with the 370 some academics figured out how to run multiple operating systems
in virtual machines. So CICS could run in a VM with the minimal OS that it
needed.

Mainframe systems had the source code for CICS and the OS and usually used a
custom build, so the kernel didn't have anything it didn't need.

The machine was expensive but cost effective if you used it efficiently, so
people did.

~~~
drudru11
Thanks for elaborating more. I think people who do performance work constantly
rediscover or re-invent this (ie like the websocket post).

------
lganzzzo
You may also want to check out:

\- benchmark project repo - [https://github.com/oatpp/benchmark-
websocket](https://github.com/oatpp/benchmark-websocket)

\- oatpp framework repo (the benchmark built with) -
[https://github.com/oatpp/oatpp](https://github.com/oatpp/oatpp)

------
vinay_ys
Your bottleneck might be packets/sec of 150k. Soft interrupt handling is
likely getting saturated. If you tune your network (receive side scaling etc)
for high packet throughput, you may be able to get the benchmark to find the
application bottleneck.

~~~
foobar502
How would you do that?

~~~
unmole
For UDP, this is an excellent resource:
[https://events.static.linuxfound.org/sites/events/files/slid...](https://events.static.linuxfound.org/sites/events/files/slides/LinuxConJapan2016_makita_160712.pdf)

------
timwis
Thought this was gonna be about elixir

~~~
nelsonic
Indeed the Phoenix benchmark with 2M concurrent connections
[https://phoenixframework.org/blog/the-road-to-2-million-
webs...](https://phoenixframework.org/blog/the-road-to-2-million-websocket-
connections) was on a 40core/128gb machine but the load was a pretty realistic
"real world" use case.

If @lganzzzo's oatpp can indeed handle 2M concurrent on 8core/52GB with
_presence_ and real data being communicated/broadcast it would be worth
looking into.

We use Phoenix for a number of projects and in addition to handling lots of
Websocket connections it's a fully featured framework with an excellent
workflow, expressive ORM and seamless DevOps.

Seems like oatpp has been built with a single purpose in mind (similar to
Redis). Always good to see people diving deep into a topic to push the
boundaries of the state of the art.

~~~
bsaul
I'm very interested in the "seamless DevOps" part. Could you point me to some
link describing how you manage to achieve that with phoenix ? (do you use
docker + kubernetes, or OTP, or something else ?)

~~~
arc_of_descent
Have a look at distillery for building Elixir releases, and edeliver for
deployment.

------
Thaxll
The memory seems high actually, I've seen other C / C++ / Go implementation
using less than a 1GB for 1m connections. Pretty cool nonetheless!

[https://github.com/uNetworking/uWebSockets](https://github.com/uNetworking/uWebSockets)

[https://speakerdeck.com/eranyanay/going-infinite-
handling-1m...](https://speakerdeck.com/eranyanay/going-infinite-
handling-1m-websockets-connections-in-go?slide=35)

~~~
lganzzzo
Thanks, In this benchmark I decided to go without much of the framework
tuning. Mostly took it as is in order to see what I can get. In any case oatpp
is general purpose web framework, it is understood that dedicated libraries
like uWebSockets may be more optimized. Nevertheless I beleve it's still much
space for tunning and optimizing of oatpp.

------
nelsonic
@lganzzzo great work! (bookmarked for further reading...) out of curiosity,
did you consider using Rust for this before using C++? or did you dive strait
into C++?

~~~
lganzzzo
Hey @nelsonic,

Thanks for the question!

I dived straight into C ++ because I had some groundwork written in C ++ from
my previous projects.

------
chirau
Get your credit card or your infra ready, whichever applies... someone seating
here with me is about to max out botnet rush to you

~~~
lganzzzo
:)) My infra is $15/Month instance running oatpp server

------
CptMauli
Does anybody have real world experience (and maybe data) for websockets when
used in a mobile environment (with a lot of dropped connections)?

------
iandanforth
I'm confused. If there are 20M clients and the server is sending 9M messages
per minute, doesn't that mean that each client is sending less than 1 message
per minute?

~~~
lganzzzo
Not 20M, but 2M fully loaded sockets... about 1 message per 13sec per
Connection

~~~
bennettlp
I might of missed it, but didn’t see in the article any mention of multiple
IPs or NICs isn’t the theoretical maximum 65k ports per IP?

~~~
ec109685
No, the tuple includes source ip and port as well.

