
Old box, dumb code, few thousand connections, no big deal - ingve
https://rachelbythebay.com/w/2020/05/07/serv/
======
kragen
Rachel presumably wrote her server in a reasonable language like C++ (though I
don't see a link to her source), but when I wrote httpdito⁰ ¹ ² I wrote it in
assembly, and it can handle 2048 concurrent connections on similarly outdated
hardware despite spawning an OS _process_ per connection, more than one
concurrent connection per byte of executable†. (It could handle more, but I
had to set a limit somewhere.) It just serves files from the filesystem. It of
course doesn't use epoll, but maybe it should — instead of Rachel's 50k
requests per second, it can only handle about 20k or 30k on my old laptop.
IIRC I wrote it in one night.

It might sound like I'm trying to steal her thunder, but mostly what I'm
trying to say is _she is right. Listen to her. Here is further evidence that
she is right._

As I wrote in [https://gitlab.com/kragen/derctuo/blob/master/vector-
vm.md](https://gitlab.com/kragen/derctuo/blob/master/vector-vm.md), single-
threaded nonvectorized C wastes on the order of 97% of your computer's
computational power, and typical interpreted languages like Python waste about
99.9% of it. There's a huge amount of potential that's going untapped.

I feel like with modern technologies like LuaJIT, LevelDB, ØMQ, FlatBuffers,
ISPC, seL4, and of course modern Linux, we ought to be able to do a lot of
things that we couldn't even imagine doing in 2005, because they would have
been far too inefficient. But our imaginations are still too limited, and
industry is not doing a very good job of imagining things.

—

⁰
[http://canonical.org/~kragen/sw/dev3/server.s](http://canonical.org/~kragen/sw/dev3/server.s)

¹ [http://canonical.org/~kragen/sw/dev3/httpdito-
readme](http://canonical.org/~kragen/sw/dev3/httpdito-readme)

²
[https://news.ycombinator.com/item?id=6908064](https://news.ycombinator.com/item?id=6908064)

† It's actually bloated up to 2060 bytes now because I added PDF and CSS
content-types to it, but you can git clone the .git subdirectory and check out
the older versions that were under 2000 bytes.

~~~
highfrequency
> single-threaded nonvectorized C wastes on the order of 97% of your
> computer's computational power

Can you elaborate on what this means exactly? For example, is there some
reasonable C code that runs 33 times slower than some other ideal code? In
what sense are we wasting 97% of our computer's computational power?

~~~
augustt
A good example of getting a ~3000x speedup from naive matrix multiplication in
C here (slides 20 onward): [https://ocw.mit.edu/courses/electrical-
engineering-and-compu...](https://ocw.mit.edu/courses/electrical-engineering-
and-computer-science/6-172-performance-engineering-of-software-systems-
fall-2018/lecture-slides/MIT6_172F18_lec1.pdf)

Includes a 9-level nested for loop, which is always great to see.

~~~
kragen
Thank you very much for posting this!

Roughly that 3000× is 18× from multithreading, 3× from SIMD instructions, 15×
from tuning access patterns for locality of reference, and 3× for turning on
compiler optimization options. This is a really great slide deck!

I was assuming "single-threaded nonvectorized C" already had compiler
optimization turned on and locality of reference taken into account. As the
slide deck notes, you can get some vectorization out of your compiler — but
usually it requires thinking like a FORTRAN programmer.

So I think in this case reasonable C code runs about 54× slower than
Leiserson's final code. However, you could probably get a bigger speedup in
this particular case with GPGPU. Other cases may be more difficult to get a
GPU speedup, but get a bigger SIMD speedup. So I think my 97% is generally in
the ballpark.

A big problem is that we can't apply this level of human effort to optimizing
every subroutine. We need better languages.

~~~
mratsim
That's why you have people working on Halide, Taichi, DaCe, Tiramisu.

\- [https://halide-lang.org/](https://halide-lang.org/)

\- [http://taichi.graphics/](http://taichi.graphics/)

\-
[http://spcl.inf.ethz.ch/Research/DAPP/](http://spcl.inf.ethz.ch/Research/DAPP/)

\- [http://tiramisu-compiler.org/](http://tiramisu-compiler.org/)

This way you can have a researcher implementing the algorithm (say bilinear
filtering) and a HPC expert who tunes it with parallelism, SIMD, tiling.

I wrote an overview of most DSL for high performance or image processing in
this issue:
[https://github.com/mratsim/Arraymancer/issues/347#issuecomme...](https://github.com/mratsim/Arraymancer/issues/347#issuecomment-459351890)

~~~
kragen
This is great! Which of these do you think could be extended to general-
purpose programming without the HPC expert? Taichi and DAPP seem to be aimed
at that goal, but you seem to be implying they don't reach it yet?

~~~
mratsim
You can use them without the HPC expert, Halide for example has a good
autotuner and has been used by Google and Adobe to create image filters for
mobile devices.

~~~
kragen
Thank you!

------
bjt
Whether intended or not, there's an undercurrent of "you're all so dumb for
using Python" (or Ruby, or PHP, or other similarly performant language) here.
I want to surface that and question it a bit.

It's totally reasonable for a company to choose the Python/Gunicorn option if
they already have a bunch of people who know Python and they don't need to
serve tons of requests per second.

Even if they do need to serve tons of requests per second, it's totally
reasonable for them to still choose Python/Gunicorn if the cost of the
additional servers is less than the cost of having to support multiple
languages. Or if they get a lot of value from libraries that are unique to the
Python ecosystem. Or if they care more about quickly iterating on features
than driving down server costs.

I agree that there's a point where it stops making sense, and there are plenty
of engineers who don't recognize when they're past that point because they
keep doubling down on sunk costs and things they're familiar with. But let's
not be too quick to assume people are in that camp when we don't know all the
tradeoffs they're facing.

~~~
rumanator
> Even if they do need to serve tons of requests per second, it's totally
> reasonable for them to still choose Python/Gunicorn if the cost of the
> additional servers is less than the cost of having to support multiple
> languages.

How hard is it to get up to speed on any other tech stack? ASP.NET Core is
extremely fast and the learning curve is close to none, for example.

If someone was able to wrap his head around backend development with Python
I'm pretty sure they have the mental fortitude to onboard a tech stack that
doesn't suffer from major performance problems.

~~~
29athrowaway
1) .NET APIs change very frequently.

2) C# is a very verbose language, that requires a lot of typing.

3) F#, the best language in .NET, is largely ignored by the .NET community.

~~~
rumanator
> 1) .NET APIs change very frequently.

ASP.NET Core 2.1 was released in 2018 and will be supported until late 2021.

ASP.NET Core 3.1 was released a few months ago and there is no end of support
in sight. Moreover, the changes between 2.1 and 3.1 were not that many. I've
migrated a whole ASP.NET Core 2.1 web service to 3.1 in less than 1 hour.

> 2) C# is a very verbose language, that requires a lot of typing.

Nonsense. The only added verbosity to C# when compared with Python are the
type declarations, which arguably are a problem plaguing Python. The first
class support for events and async programming and properties in C# more than
make up for it.

> 3) F#, the best language in .NET, is largely ignored by the .NET community.

I fail to see what point you were trying to make.

~~~
flukus
> ASP.NET Core 2.1 was released in 2018 and will be supported until late 2021.

Which is way too unstable, especially for the kind of corporate environment c#
has typically been used in. Getting those places to upgrade to stable
supported versions of the framework has always been a battle even when
backwards compatibility was great, if they have to deal with breaking changes
every few years they will never upgrade.

This is why so many companies stick with their ancient COBOL systems, most
modern alternatives don't offer the stability they need.

~~~
rumanator
> Which is way too unstable,

ASP.NET Core 2.1 is the LTS release of ASP.NET Core 2, which was released in
2017. I fail to see how a first class framework with a LTS that was released
years ago can be described with a straight face as "way too unstable".

> Getting those places to upgrade to stable supported versions of the
> framework has always been a battle

ASP.NET Core 2 is stable since at least 2 or 3 years ago, depending on how you
decide to count.

> This is why so many companies stick with their ancient COBOL systems, most
> modern alternatives don't offer the stability they need.

This assertion is simply wrong at so many levels. Don't confuse "why waste
money maintaining working software" linesof reasoning as a sign of respect for
stability.

More importantly, it's disingenuous to even think of the technical debt that
keeps cobol on the map as relevant to the world of web services.

~~~
flukus
> I fail to see how a first class framework with a LTS that was released years
> ago can be described with a straight face as "way too unstable".

I fail to see how you can call 2 years of support an LTS with a straight face,
it's taking the piss out of the term. The LTS of the OS I'm likely to run it
on is supported for 8 years. 2 years isn't even enough time to finish many
projects on the same LTS it started on.

At work we've got 30 year old c/c++ code bases that still run, they'll
probably run for another 20 at least, we've got 20 year old python code that
still runs (for now) and we've got 20 year old c# projects that still run.
That last one will never get rewritten in .net core, in part because they've
pissed away the stability the framework had. It would be crazy to use tools
with 2 years of support for any of those projects.

> Don't confuse "why waste money maintaining working software" linesof
> reasoning as a sign of respect for stability.

Why should they waste money maintaining working software when there are stable
options available? What does upgrading to asp.net core get them? Why should
tens of thousands of companies waste money modifying working software just
because someone on the core team thought the existing API was inelegant or too
hard to maintain compatibility?

------
bob1029
I think going back to basics would be a really good idea for a lot of people
in software. It seems like modern developers are more disconnected than ever
from the reality of the hardware situation sitting right next to them. I
believe there was a post on the front page today detailing a certain 22ms
Hello World execution...

------
yowlingcat
```

First of all, it does not take "that much machine" to serve a fair number of
clients. Ever since I wrote about the whole Python/Gunicorn/Gevent mess a
couple of months back, people have been asking me "if not that, then what". It
got me thinking about alternatives, and finally I just started writing code.

```

Another day, another questionable premise for a blog post. No,
@rachelbythebay, the question is not "if not that, then what", and the answer
is not reinventing the wheel. The question is just "what ___" \-- what do you
plan on doing, what does your software need to do, what does it need to
support? Use the right tool for the right job. If you have a language you're
proficient in and with an ecosystem that supports you developing something
rapidly, it's borderline malpractice not to start there. When you need to
optimize, optimize then. Maybe that means you carve out a subcomponent into a
new service, and you choose a language purpose built for speedily doing what
you need. Maybe it means a lot of things, but it doesn't mean you throwing out
the baby with the bathwater and setting out to recreate the baby, the bathtub,
and the bathwater from scratch to answer the question of why your tub is
overflowing.

I wish this blog post was about solving real engineering problems instead of
writing code to provide mediocre answers to poor questions.

~~~
Aeolun
> If you have a language you're proficient in and with an ecosystem that
> supports you developing something rapidly, it's borderline malpractice not
> to start there.

I think the point of this post is that most new programmers do not _know_
things can be faster than their monstrous JS blob. The solution to slow
requests is _more servers_ instead of fixing the code.

~~~
Doxin
> I think the point of this post is that most new programmers do not know
> things can be faster

Related to this: programmers who know an ORM or two but never learn SQL
proper.

At my job I refactored a giant, slow, memory hungry reporting task into a
single sql query. It used to take 10s of minutes to collate 1000s of
datapoints. Now it takes 10s of _miliseconds_ to collate a factor 10 more
data. Never mind after I added some indexes to speed the thing up.

Knowing about the layer below the abstraction you're working at can be rather
useful at times.

~~~
yowlingcat
What's telling here is that you refactored the individual task into a single
query, and nothing outside of that. That's fantastic! You scoped your work
into a small unit and derived a significant impact. You maximized your impact
to effort ratio.

Notably, you did NOT decide to write your own dialect of SQL to "scratch an
itch". Knowing the layer below the abstraction you're working at can be useful
at times, but only in relation to understanding the context of how all the
layers fit together and making effective, pragmatic decisions. Pointless
rewrites are anything but. Good on you for avoiding that impulse and doing the
right thing.

------
sytringy05
Compare this to the Enterprise platform I am dealing with on a current
project, which has 4 x ec2 nodes with 8 CPUs and 32 Gb RAM.

I can DoS it with a single java client running 50 threads. If I use 100 the
p95 shoots up to 30 - 40 seconds.

But the kicker is that no one (other than me) really cares. 50 concurrent
threads is probably around the peak load it will get in prod, and various
people involved think why bother trying to fix it?

It's driving me nuts.

Thanks for listening

~~~
Aeolun
Oh man, don’t remind me. We have a bunch of GraphQL proxies in ECS that
somehow cannot handle more than 5 connections each, so naturally the solution
is to just spin up 19 more of them to get to 100 concurrent connections...

All of them sit at 1% cpu as well.

~~~
rbinv
That actually made me laugh. The time savings can't possibly be worth it.

------
l0b0
The only endpoint to ever do anywhere near as little work as this example
would be the heartbeat. Most of the cost comes from a combination of a bunch
of things _other_ than simply fetching some data from a local device:

\- Web-anything is fetched off of a DB nowadays. That's another crapton of
latency, because a) it's just easier to understand its characteristics if it
runs on a separate system and b) most companies IME have either no DBAs at all
or the DBAs have no time to look in depth at every system being built. So the
DB resources are vastly underutilized, and the DB itself is badly understood.

\- Cache invalidation is _still_ the hardest problem, and every caching
framework I've looked at seems to gloss over that part. Just cache all the
things and hope people retry enough times to get the latest update. I would
love someday to work on a system where things are aggressively cached at every
level _and_ invalidated at every level and with perfect granularity.

\- Building for web scale from the beginning is premature optimization for the
vast majority of companies. In the vanishingly unlikely scenario that the
company actually grows 100-fold or more it makes sense to start investing
heavily in performance. Of course, this also has the knock-on effect that the
vast majority of software developers never get anywhere near a web scale
system. OTOH it creates jobs for millions of developers, some of whom might
end up building at scale someday.

Another elephant in the room is that building _anything_ at web scale is just
not something anybody straight into the workforce is anywhere near qualified
for. We desperately need more focused learning (mentoring, pairing, etc.)
across the board to bring everybody up to speed faster.

------
whorleater
The typical response to these types of posts is "oh your /toy/ server doesn't
account for x, y, z in my use case, like ddos, network issues, etc. But how
many people _actually_ handle those cases in your production application? I
can say for the majority of the applications I've written at large companies
handling significant traffic API compatibility was far higher on the priority
list than the cases that people often bring up.

IMO she's right, I wish we didn't mess wrap around gunicorn and gevent for
some of our services. Certainly would've made my life easier and the services
faster.

~~~
kragen
Also it sounds like Rachel's server does account for DDoS and network issues.

------
ivalm
What WSGI do people recommend for python? I've been using gunicorn but this
made me think of alternatives. Quick google search found this benchmark [0],
is it really that bjoern is much quicker? It seems all other WSGI are ~
equivalent.

[0] -
[https://www.appdynamics.com/blog/engineering/a-performance-a...](https://www.appdynamics.com/blog/engineering/a-performance-
analysis-of-python-wsgi-servers-part-2/)

~~~
DizzyDoo
I'm using Waitress in production:
[https://docs.pylonsproject.org/projects/waitress/en/stable/](https://docs.pylonsproject.org/projects/waitress/en/stable/)
It's main thing is that it's really simple, worth a look if you're using
Django.

~~~
CopyZero
going to look at this in depth tomorrow. Thanks

~~~
X-Istence
The thing to note is that waitress is not built for the highest speed, or the
fastest, or anything along those lines.

Its primary use case is that it is pure python, doesn't rely on any specific
libraries or compilers to run/build, and is a threaded WSGI implementation so
it uses Python threads to run a WSGI app.

It works well for what it needs to do, and hopefully it is fairly robust. I've
personally ran waitress directly facing the internet, but will readily admit
that in most cases running it behind a load balancer is a good idea,
especially since it doesn't support SSL out of the box (yet, I should say,
it's on my roadmap).

It won't win any speed contests and it won't win performance contests, but it
holds its own.

If you have any issues, please drop by
[https://github.com/pylons/waitress/issues](https://github.com/pylons/waitress/issues)
and I'll see if I can help you out :-)

------
xakahnx
The general attitude here reminds a bit of the following post from the
architect of the Varnish proxy. I think the attitude comes down to the fact
that modern kernels and in general the foundations of network programming are
pretty strong. We should trust them more.

[https://varnish-cache.org/docs/5.2/phk/notes.html](https://varnish-
cache.org/docs/5.2/phk/notes.html)

~~~
mapgrep
I would argue you can say the same about foundations of rdbms systems as well.
People build similar elaborate caches around those, for example rails has a
“russian doll” cache layer built in, not realizing how much time has gone into
developing well tuned caches within the database system itself, which simply
needs to be allocated sufficiently large ram.

~~~
tdeck
Russian doll caching also caches view rendering which can be non-trivial in
Rails.

------
simfoo
Honest question: why go through the hassle of multiplexing waiting in a single
thread only to dispatch to a thread per client anyway? Simply using blocking
IO for the clients in those threads should be much simpler right?

~~~
rachelbythebay
If you get stuck in read(), you can't do neat things like waking up when it's
time to kick a client for being idle, doing other housekeeping, or cleanly
shutting down the whole thing in a timely fashion. When I ^C the server, it
sends the same wake condvar-poke but it twiddles the flags so the worker shuts
down instead.

~~~
pdonis
Yes, using epoll with nonblocking I/O is better than blocking I/O on each
worker thread. But that basically means that you are doing asynchronous
programming--i.e., the exact same thing that the wonky Python/Gunicorn stack
you described is doing! You're just doing it with better attention to
important details.

Here, to me, is the key item:

 _The "listener" thread owns all of the file descriptors (listeners and
clients both), and manages a single epoll set to watch over them._

This is exactly what any async server does: it centralizes all the file
descriptor management and handling in one place, and only uses workers
(whether they are threads or "green threads" or whatever) to read from/write
to fd's that are marked as ready in the epoll set.

For your case, unless I'm misreading something, what the workers are doing in
between the read/write is CPU intensive (or at least it's CPU work and not I/O
work, even though it's not very "intensive" CPU work), so actual OS threads
are a better choice for the workers since you can't rely on cooperative
scheduling.

If what the workers were doing was I/O work (for example, sending a request to
a remote database and waiting for a response), "green threads" would work fine
(since their only real purpose would be to organize the I/O--the actual fd's
are going to be managed by the central server that manages all the fd's and
checks which ones are ready for read/write). And one definitely should not try
to run "green threads" for the same server in multiple O/S threads (or worse
still, multiple OS processes). For an I/O bound server, one shouldn't need to
anyway.

~~~
underdeserver
Better attention to important details is what makes or breaks a library.

~~~
swsieber
Exhibit A: Dropbox (attention to detail and executing it)

------
robrtsql
> Ever since I wrote about the whole Python/Gunicorn/Gevent mess a couple of
> months back, people have been asking me "if not that, then what". It got me
> thinking about alternatives, and finally I just started writing code.

I actually want to know: then what? As a web developer who usually reaches for
Django or Flask with Gunicorn because I just don't know any better, is there a
better stack that doesn't face these problems? Or is this a 'call to action'
for somebody to build a better web server that follows this advice?

~~~
EdwardDiego
> is there a better stack

Apparently, it's C, or ...assembly? According to the comment above yours.

Because developer time is apparently cheaper than CPU cycles. Weird.

~~~
kenhwang
Every time I read one her posts, I think, wow she must either work for
literally dirt cheap or her code is expected to run on several thousand
machines. Most projects just don't reach the scale or steadystate where
developer time is cheaper than machine time. It's fun trying to squeeze the
last drop of blood from stone, but it rarely makes economical sense.

~~~
disgruntledphd2
To be fair, I believe she spent the guts of a decade at Google and Facebook as
a Production Engineer, so this isn't really that surprising.

------
WilliamEdward
Actually, now that i'm re-reading this, it's a disaster. She claims to write
this to help newcomers, yet everything is so cryptic.

`it kicks off a "serviceworker" thread to handle it.`

It doesn't even tell me how I should do this. I don't know what
'serviceworker' code looks like. I don't know what 'kicking off' means.

This post reads like it was made for maybe 3 or 4 people in the world who
truly 'get it' and if you aren't in that elite club you're a terrible
engineer, apparently. There's not even a lick of example code to get an idea
of what is going on. Engineering is a big word that shouldn't be used in this
blog post.

------
lazyjones
The problem with this kind of "benchmark" is that it doesn't measure anything
relevant for realistic situations where thousands of connections will behave
in arbitrary unexpected ways, like just stalling, and real web applications
have absymal worst-case behaviour in case of I/O, cron-jobs running in the
background, network hiccups etc.. You don't provision your systems with only
best-case situations in mind. But sure, if you want to serve useless random
numbers without ever even hitting the disk, a Raspberry PI is able to saturate
its network connection.

------
kccqzy
The C10k problem was challenging around the turn of the century. I suppose
it's now not. I wonder how much CPU would be saved using an event-based
architecture.

~~~
milesvp
Even in 2007 when I was starting to cut my teeth on larger web traffic there
was a lot of discussion around serving 10k concurrent connections. I remember
being blown away by a graph showing high throughput at 70k concurrent by a
YAWS server, and started following Erlang as a result.

~~~
aeyes
In 2004 we had eDonkey servers handling 1M concurrent connections.

Code, changelog and a bit of history:
[https://lugdunum.shortypower.org/kiten.html](https://lugdunum.shortypower.org/kiten.html)

~~~
milesvp
Oh, wow, I vaguely remember running across eDonkey at one point. I don't think
I ever realized it could handle that kind of load. I was in a position to
mostly stick with Apache for various non technical reason basically for long
enough that eventually Apache got to the point that it could handle the
traffic I needed to deal with especially with a CDN in front of it.

------
bdcravens
"at least one stupid chat client that's actually a giant piggy web browser
running"

I wonder what app that could be :-)

~~~
trianglem
I’ll bite, is it slack?

~~~
haneefmubarak
In all honesty, there's a whole slew of platforms it could be:

\- Slack

\- Keybase

\- Mattermost

\- Discord

\- Microsoft Teams

\- Facebook Messenger (incl. Work)

All of these fine folks apparently use Electron or a similar technology...

~~~
tjalfi
Cisco Jabber also embeds a browser engine.

I think of Electron as Gresham's law in action.

------
losvedir
I know pcwalton has been banging the "just use threads" drum for a while now.
Rust used to use a green thread approach, but if I recall correctly, they
ripped all that out for just native threads with the idea that in _most_ cases
it's fast and efficient enough.

I remember when C10K was the big challenge, but even the naive approach of
spawning a thread per connection now can handle that.

~~~
birktj
I believe the reason Rust dropped green threads was to get rid of the runtime
system and become a more systems language.

Modern fast network code in Rust is based on async/cooperative multitasking
distributed over num cpu worker threads which works very well.

------
flerchin
Mmmmm. I set up a local kafka cluster serving millions of connections. I'm not
Rachel by the bay though.

------
nromiun
And tomorrow there will be an article describing an exploit in a web service
written in C/C++.

Does she even see the size of the batteries included in a framework like
Django? Here is some brand new information. Everyone knows Python/Django is
slow and inefficient (same for rails) but they still use it for those
batteries.

It is easy to write some web service in a low level language that just
crunches some numbers for your benchmark. It quickly gets complicated when you
start thinking about accounts, security, passwords, databases, orders, carts
etc.

------
peterwwillis
The premise is that people don't understand their system, or software, and
need to be told how it works. So you have an admittedly ignorant person
telling other ignorant persons that there are possibilities out there.
That's... not really useful.

What is useful is to spend 2 days researching massively concurrent network
applications, and find the ones that were already written, and see how they
evolved over time. For the most part, it's just understanding your operating
system and network protocols. Once you learn how they all work, the answers
come quickly. This problem has been solved many times.

But for the most part, none of you need to know how this stuff works. You can
write the crappiest network server in the world, and there are still so many
other components you can wrap around it to make it scale that you never even
need to get close to network optimization. So you may have to run 10 instances
of your app; who cares? We have virtually unlimited everything these days. The
answer to a poorly performing app is "just throw more cloud at it".

Also, I feel the need to remind everyone that single app instance handling
100K+ concurrent connections is a _terrible fucking idea._ What happens when
the app crashes? What happens when the hardware dies? What happens when you
need to, like, upgrade/restart the app? Several million SYN packets in 100ms,
and default kernel TIME_WAIT settings, do not make great bedfellows.

~~~
rachelbythebay
Yeah, I know, I'm terrible. I don't know jack about this stuff. I should just
stop now while I'm behind. Clearly.

------
rb808
Stuff like this worked great until someone tries to DDoS you, or found some
buffer overflow which took over the box. I dont like the new world either but
it was built this way for a reason.

------
z3t4
You must know a lot in order to write a slow performing program. If you just
do a simple program like in the article, it will be fast, even if you do
stupid things like spawning a new thread and file descriptor for each
connection.

Now in order to make it slow you must know enough stuff to introduce
complexity to the program. To make a slow program you probably have to learn
about frameworks, php, micro services, cloud databases, etc.

------
jpitz
Would love to see the code.

~~~
205guy
Funny, when she revealed what the threads were doing, I wanted to search for
Shakespeare in the output.

------
dangerface
Old box, dumb code, few thousand connections, no big deal.

Old box, complicated code, few thousand connections, good fucking luck.

------
rantwasp
should try this in erlang/elixir. i’m gonna bet you it could handle hundred of
thousands of connections on a beefy machine (an million of connections w/
optimizations)

~~~
unnouinceput
why tho? In real life if you're in need of handling millions of users per
second, I bet you're already part of FAANG, at which point you simply open
offices in each country and deploy local servers.

~~~
splintercell
Doesn't WhatsApp story contradict to what you're saying? They handled millions
of users per second, weren't part of FAANG and this helped them get acquired
by FAANG and a high valuation for being a very nimble team and architecture.

~~~
unnouinceput
how that contradicted me? I see this as exactly opposite, as in they became
part of FAANG exactly because they managed to handle those users. And let's be
real, they handled millions users/second after becoming Facebook. Like it or
not FB is no.1 social network.

~~~
ClikeX
They got aquired in 2014. They hit 400M users in 2013. By then it was already
THE messaging app for several European countries.

I know amount of users doesn't equal users/second. But we shouldn't pretend
like they weren't able to handle high traffic before Facebook aquired them.

~~~
unnouinceput
400M in 2013 globally and millions per second does not equal. A simple math
says 4x10^8 / 24 / 60 / 60 = ~4.7k users / second. Kind of 3 orders of
magnitude lower. Even if you say most users are not spread evenly over those
24 meridians due to Earth being mostly water it will still not get to those
millions per second.

~~~
splintercell
In 2012 they were able to scale to 2 million tcp connections on a single box
(and trust me, they actually needed that):

[https://archive.is/Jo6n](https://archive.is/Jo6n)

~~~
unnouinceput
for how much time? you see lab result and real life results are different.
Love it or hate it currently there is only one company that can deal with
millions of users per seconds and that's Google. No one else does it, not
Amazon, not Facebook including their whatsapp (a small math from my above
calculation says the number goes from ~5k to ~50k if you say whatsapp has 4
billion daily users, which I doubt it has), not Microsoft.

Each time football Cup is active Twitter goes down. Same for plenty of big
names when they launch a hyped service (Blizzard for example is another one).
Scaling up from lab to real life dealing 24/7 with those millions/second is an
entire different beast.

~~~
rantwasp
yeah no. a lot of people can do this. i don’t know why you think google is
special but it’s not. maybe you can share so that i can understand your angle

~~~
unnouinceput
already did. the bit about twitter and blizzard were 2 examples. you can
google (pun intended) to search for more examples

------
sandGorgon
Fastapi - a pleasant-to-use async python framework with strong type checking
and lots of stuff built in - comes in at rank 68 on Techempower database
update benchmarks.
[https://www.techempower.com/benchmarks/#section=data-r18&hw=...](https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=update)

It's about 30% slower than the fastest golang equivalent.

For multiple-queries benchmark, its about 40% slower.
[https://www.techempower.com/benchmarks/#section=data-r18&hw=...](https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query)

Once I start bringing Numba for accelerating computational usecases... I guess
the difference will be even smaller.

------
throwaway894345
Unfortunately this approach precludes GIL languages like Python. If you're
able to use a language/runtime that is amenable to multithreading, then using
a thread per connection works fine for most use cases (and it's probably
easier than using whatever async/await interface your language has).

~~~
pmoleri
> probably easier than using whatever async/await interface your language has

Is it? I haven't used threads directly in a while, but I remember dealing with
sinchronization issues. Problems that just don't exist in single threaded node
with async await.

I find the async Promise or Task to be a more useful abstraction than the
thread. Although, you need threads or a task dispatcher with a pool of threads
if you need to run cpu intensive stuff.

~~~
throwaway894345
If you’re just handling requests there isn’t much for shared state, so you
don’t have to worry about synchronization.

~~~
pdonis
_> If you’re just handling requests there isn’t much for shared state_

No, but if your single listener/server thread is managing epoll for all your
file descriptors, you do have to have a way of synchronizing the worker
threads with it, so they know when and when not to read from/write to their
fd's. I assume Rachel is using some kind of semaphore or other threading
synchronization mechanism for this.

~~~
throwaway894345
Right, there is some shared state, but it’s very minimal.

------
bibyte
Does anyone have a link to the source code? I was thinking about writing some
code in asyncio to compare but I couldn't find the code.

------
0xEFF
That same old box could handle a million connections without the threads. In
less than a hundred megabytes of ram too.

Try it today with NGINX.

------
di4na
Thank you Rachel for saying loud what the erlang community say for years :)

This is basically the erlang design. And yes it works really well.

------
renatovico
uvicorn with quart in python 3.7 handle a 800-1000 req/s in gke :) porting
flask to quart take a 2 months of work

------
thenewwazoo
But this isn’t engineering. This is the IT equivalent of building a bridge and
driving successively larger trucks over it. In real engineering fields, you
can do predictive analyses based on prior empiricism. There’s none of that in
our fields until you’re talking about very small systems where, for example,
the stack consumption can be determined in advance and the scheduler can give
you guarantees about worst-case performance.

And this is to the detriment of all of us. That’s why you’ll never hear me
call myself an engineer.

~~~
wondringaloud
I'm glad someone else sees this the way I do. What the blog writer did was
tinker with something. They didn't engineer it.

I was a mechanical engineer prior to switching to software. As a general rule,
the things we do in software are very distant from engineering.

~~~
hedora
I think the analogy to mechanical engineering is something like this:

A seasoned engineer notices that everyone is only building suspension bridges
all of a sudden.

They point out that you can span a stream with some bricks or rocks and a bit
of mortar, and are ridiculed.

Next, they build a highway overpass out of concrete pylons, and stress test it
to 10x the necessary engineering load, and point out it cost 10% as much as a
typical contemporary suspension bridge. That’s this article.

~~~
kragen
Or vice versa: they notice that everyone is only building concrete bridges all
of a sudden, disregarding the traditional and much cheaper approach of hanging
boards under some rope handrails; upon being ridiculed, they build a
suspension bridge across a river for 10% as much as a concrete bridge.

