
We have to talk about this Python, Gunicorn, Gevent thing - emsal
https://rachelbythebay.com/w/2020/03/07/costly/
======
orisho
The problem that's described here - "green" threads being CPU bound for too
long and causing other requests to time out is one that is common to anything
that uses an event loop and is not unique to gevent. node.js also suffers from
this.

Rachel says that a thread is only ever doing one thing at a time - it is
handling one request, not many. But that's only true when you do CPU bound
work. There is no way to write code using blocking IO-style code without using
some form of event loop (gevent, async/await). You cannot spin up 100K native
threads to handle 100K requests that are IO bound (which is very common in a
microservice architecture, since requests will very quickly block on requests
to other services). Or well, you can, but the native thread context switch
overhead is very quickly going to grind the machine to a halt as you grow.

I'm a big fan of gevent, and while it does have these shortcomings - they are
there because it's all on top of Python, a language which started out with the
classic async model (native threads), rather than this model.

Golang, on the other hand, doesn't suffer from them as it was designed from
the get-go with this threading model in mind. So it allows you to write
blocking style code and get the benefits of an event loop (you never have to
think about whether you need to await this operation). And on the other hand,
goroutines can be preempted if they spend too long doing CPU work, just like
normal threads.

~~~
CameronNemo
Can't a CPU bound task just make itself preemptible in Python by calling some
dummy async function, like asyncio.sleep(0)?

~~~
anaphor
Doesn't that mean you have to somehow know when it's been running too long and
then yield back control somehow? What if the slow operation is something
that's atomic, like multiplying some huge numbers?

~~~
CameronNemo
Well the example given is decoding JSON. If that is happening in a long loop,
you can yield once per loop and be safe. Not all problems are neatly broken
apart like that, but in those cases how much of a chance does the server have
to not timeout regardless, you know?

Note that once per loop might be too often, but you can just measure how long
a loop run typically takes and compare to how soon you want to preempt the
task, then yield at the right interval.

~~~
emj
Seems like abstractions will bite you there, most will just do
cool_library.unmarshall(request) in some variant, those libraries will not
have the same method for yielding as you have.

~~~
TeMPOraL
Abstractions are meant to be broken! One could probably work around this
problem by adding new functions to cool_library, or modifying existing ones,
whose code would be copy-pasted from the library but with some
asyncio.sleep(0) calls spliced in strategic places :). For legacy projects, it
may make more sense to cheat like this than to rewrite the whole project in a
saner tech stack.

------
orf
> Go back and look. I said that it forks and then it imports your app. Your
> app (which is almost certainly the bulk of the code inside the Python
> interpreter) is not actually in memory when the fork happens. It gets loaded
> AFTER that point.

You can just pass `--preload` to have gunicorn load the application once. If
you're using a standard framework like Django or Flask and not doing anything
obviously insane then this works really well and without much effort. Yeah I'm
sure some dumb libraries do some dumb things, but that's on them, and you for
using those libraries. Same as any language.

If you want to stick your nose up at Python and state outright "I will not
write a service in it" then that's up to you, it just comes across as your
loss rather than a damning condemnation of the language and it's ecosystem
from Rachel By The Bay, an all-knowing and experienced higher power. I guess
everyone else will keep quickly shipping value to customers with it while you
worry about five processes waking up from a system call at once or an extra
150mb of memory usage.

~~~
cakoose
Even if you manage to preload everything you need, Python's reference counting
mechanism will cause everything to be copied anyway.

Every time you access an object, its reference count is mutated, which will
the memory page to be copied.

There are workarounds if you're willing to mess with the Python interpreter:
[https://instagram-engineering.com/copy-on-write-friendly-
pyt...](https://instagram-engineering.com/copy-on-write-friendly-python-
garbage-collection-ad6ed5233ddf)

~~~
pas
It still helps with the loaded .so files and whatnot. The code objects and
other things that are immutable. (CPython refcounts those too?)

~~~
pdonis
_> The code objects and other things that are immutable. (CPython refcounts
those too?)_

CPython refcounts all objects. Refcounting is not required because of
mutability; it's required because the interpreter needs to know when an
object's memory can be reclaimed for something else.

I don't know if code objects specifically would have their refcounts mutated a
lot, since typically they're only referenced by one object, the function that
they're the code for. But _function_ objects will have their refcounts mutated
every time the function is called, since that sets up a stack frame that grabs
a reference to the function object and then releases it when the function
returns.

------
nodamage
Yes after reading through the article it's not very clear to me what the
actual problem is with using Python/Gunicorn/Gevent.

The author seems to be saying something about how if a worker is busy doing
CPU intensive work (is decoding JSON really that intensive?) then other
requests accepted by that worker have to wait for that work to complete before
they can respond, and the client might timeout while waiting?

If that's the case:

1\. Wouldn't this affect any language/framework that uses a cooperative
concurrency model, including node.js and ASP.NET or even Python's async/await
based frameworks? How is this problem specific to Python/Gunicorn/Gevent?

2\. What would be a better alternative? The author says something about using
actual OS-level threads but I thought the whole point of green threads was
that they are cheaper than thread switching?

~~~
alrs
Decoding JSON in Python without using a C library is indeed CPU-intensive.

~~~
pdonis
Python's standard library json module uses a C extension module for the CPU
intensive stuff.

~~~
hamburglar
Yes, decoding JSON in python is much more efficient if you are not _actually_
decoding JSON in python.

------
benreesman
It seems to me that this submission is getting a lot of blowback in the
comments for 1) the style and 2) the implication that wiring up Python
services with HTTP is bad engineering. I don’t think this is productive.

On the first point, yeah Rachel’s posts are kinda snarky sometimes, but some
of us find that entertaining particularly when they are highly detailed and
thoroughly researched. I’ve worked with Rachel and she’s among the best “deep-
dive” userspace-to-network driver problem solvers around. She knows her shit
and we’re lucky she takes the time to put hard-earned lessons on the net for
others to benefit from.

As for “microservices written in Python trading a bunch of sloppy JSON around
via HTTP” is bad engineering: it is bad engineering, sometimes the flavor of
the month is rancid (CORBA, multiple implementation inheritance, XSLT, I could
go on). Introducing network boundaries where function calls would work is a
bad idea, as anyone who’s dealt seriously with distributed systems for a
living knows. JSON-over-HTTP for RPC is lazy, inefficient in machine time and
engineering effort, and trivially obsolete in a world where Protocol
Buffers/gRPC or Thrift and their ilk are so mature.

Now none of this is to say you should rewrite your system if it’s built that
way, legacy stuff is a thing. But Rachel wrote a detailed piece on why you are
asking for trouble if you build new stuff like this and people are, in my
humble opinion, shooting the messenger.

~~~
diebeforei485
> we’re lucky she takes the time to put hard-earned lessons on the net for
> others to benefit from.

I genuinely don't see much of a lesson to learn from this particular blogpost,
and it appears neither did many others in HN. If there is one, beyond "don't
use x", it's hard to find it.

I get the impression that this particular post is being upvoted to the top of
HN because of who the author is, not necessarily because this post itself has
value. This results in a whole bunch of others reading it, wondering why
they're wasting their time with such a rambling post.

~~~
kragen
Here are some lessons you might learn:

1\. Load before forking.

2\. Remember that green threads tend to have problems with fairness of
scheduling.

3\. JSON decoding gobbles CPU.

4\. Scheduling fairness problems increase response time variance.

4½. Green threads also increase it.

5\. Don't forget about retries of timed-out requests into account in protocol
design; idempotence is the simplest solution when you can use it.

6\. Wake-one semantics to avoid the thundering herd are important for
performance when you have multiple threads, and Gunicorn has that thundering
herd problem, so you probably don't want to be running it this way on a
64-core box with hyperthreading. (The problem is of course less severe than it
was for Apache because the green threads don't thunder.)

7\. Gevent uses epoll, not select, poll, or RT signals

8\. EAGAIN and SIGPIPE if you didn't know about those. ( _Somebody_ is in
today's lucky ten thousand.)

9\. What kinds of mechanisms “tend to show up given time in a battle-tested
[network server] system.”

10\. Your systems don't have to be fragile pieces of shit.

~~~
rachelbythebay
Very nice. Thank you.

~~~
kragen
I'm not sure whether the person I was replying to is The One to whom all these
things are too obvious to be worth mentioning, or if these were too implicit
for them to notice, or a combination. Either way, no, thank _you_ for writing
it.

------
tgbugs
This is a great review of what is going on "behind the scenes."

As the maintainer of about 5 little services with this structure I have vowed
never to write another one. The memory overhead alone is a source of eternal
irritation ("Surely there must be a better way....").

Echoing other commenters here, the real cost isn't actually discussed. Namely
that there is a solution to some of these problems (re long running tasks?),
but it carries with it a major increase in complexity. Its name is Celery and
oh boy have fun with the ops overhead that that is going to induce.

A while back I did some unscientific benchmarking of the various worker
classes for python3.6 and pypy3 (7.0 at the time I think?). Quoting my summary
notes: 1\. "pypy3 with sync worker has roughly the same performance, gevent is
monstrously slow gthread is about 20 rps slower than sync (1s over 1k
requests), sync can get up to ~150rps" 2\. "pypy3 clearly faster with tornado
than anything running 3.6" 3\. "pypy3 is also about 4x faster when dumping nt
straight from the database, peaking at about 80MBps to disk on the same
computer while python3.6 hits ~20MBps"

I won't mention the workload because it was the same for both implementations
and would only confuse the point, which is that there are better solutions out
there in python land if you are stuck with one of these systems.

One thing I would love to hear from others is how other runtimes do this in a
sane and performant way. What is the better solution left implicit in this
post?

~~~
coleifer
With python the only thing that matters _is_ the workload.

------
seemslegit
"I will not use web requests when the situation calls for RPCs"

I'm surprised how often devs treat this distinction as architecturally
meaningful. Web requests are just RPCs with some of the parameters
standardized and multiple surfaces for parameters and return values - query
string, headers, body. This is completely orthogonal to the strategy used to
schedule IO, concurrency, etc.

~~~
jchw
I mean... you do have to parse text formats. HTTP parsing may be a solved
problem, but that doesn't mean the overhead or complexity of doing so
disappears.

Also, TLS is not really ideally lightweight for RPCs, but you should
absolutely encrypt your RPC traffic (imo.) So I really think the whole stack
is out.

(P.S.: If you are wondering what kinds of 'lightweight' replacements for TLS
exist, I think my personal favorite attempt is CurveCP, although it is a bit
dated nowadays. I wouldn't often recommend people roll their own, but you
could certainly do something simple with NaCl/libsodium directly. Maybe QUIC
also fits the bill?)

~~~
seemslegit
There is nothing that says that "RPC" can't do multiple request/response
cycles over an existing open (and encrypted) connection rather than initiate a
new one for every call, just like HTTP. Or even pipeline them like HTTP/2.0

~~~
jchw
HTTP 2 is more reasonable. But by the time you get to HTTP 3 you're just doing
HTTP 2 over QUIC. At which point, why not just send RPC payloads directly over
QUIC?

~~~
seemslegit
Because what I have in place is a good old HTTP 1.1 ?

~~~
jchw
That's circular. I can't argue anything from this point. "Why should I get an
electric car when I have a good old gas car in the driveway?" I don't have an
answer. I do have answers for why one is better than the other. APIs work over
HTTP in spite of limitations, not because of good synergy. I think gRPC is the
most reasonable implementation of such (disclaimer: I work for Google, but not
on gRPC and I've used gRPC before I worked at Google) but I still think it is
overkill for many people. If you are using HTTP+REST+JSON and it works fine
for what you are doing, then fine - there's an ecosystem already built around
it. But the kinds of things people do with lighter weight and more efficient
RPC layers literally aren't doable over standard HTTP/1.1 and REST. It enables
stuff you wouldn't think of, when you can measure the absolute overhead in
bytes. (As an example, I'm not aware of anyone actually doing this, but it
would almost certainly be possible to forward low level signals like USB or
perhaps even PCI express packets over a lightweight RPC layer, and get all of
the encryption/access control/etc. you already have in your stack.)

Answers for why HTTP/1.1 is a poor fit:

\- Text format requires text parsing. How long do you limit the header lines?
What transport compression do you support? Text parsing is inefficient
compared to binary formats.

\- A lot of difficult to understand behavior. When do you send 100 Continue,
what do you do when you receive it? What happens when you are on a keep-alive
connection and there's no Content-length? (There's a whole flow chart for
something simple like this.) etc.

\- A lot of cruft. Like chunked encoding is weird. Trailers are also weird.
What happens when a header is specified twice?

Answers for why HTTP/2 is still a poor fit:

\- What are the headers even for? You now have this entire section of your
request that doesn't matter, with its own compression scheme called HPACK.
Why?

\- Server push. It's nice that you have bidirectional streams, but this is
clearly designed for browser agents. gRPC repurposes this for bidirectional
streaming as it should be, but...

\- ...Often times, hacks like that lead to the worst problem: You did all of
this work to use HTTP as an RPC layer, and you can't even use it in a browser
because the sane things you do for your backend might not be compatible. in
gRPC there's a special layer for handling this, but it's a lot of additional
cruft.

HTTP/REST is great because there's a huge ecosystem, but that's not even a
solid win due to the complexity. As an example, years ago I ran into huge
problems with Amazon ELB because it was buffering my entire request and
response payloads, and imposing its own timeouts on top. All documented
behavior, but you can't just plug in this HTTP thing and hope for it to work.
Basically anything in the middle that also speaks HTTP has to be carefully
configured. Again, leading to doubt over the whole point of using a protocol
like HTTP. There's rules for what should be GET, PUT, POST, DELETE, and yet
those all interact strangely. No payload in GET body, some software gets weird
about calls like DELETE, so sometimes you have to support POST for what should
be a PUT and so on.

And at the end of the day, all you really wanted was RPC payloads in both
directions, and you have all of this crap around it, and it's largely just
because web browsers exist, but none of this stuff even works well together.

It works OK if you don't really care much and just throw a software stack
together, but that doesn't mean it will be efficient, doesn't mean you won't
run into problems. I definitely prefer to go for simpler, and HTTP is not
actually simpler. It just has the benefit of having an existing ecosystem.

~~~
seemslegit
I don't think we disagree about anything here, if you wanna optimize for
maximal machine/network utilization then optimize for that with gRPC or
equivalent, if you wanna optimize for lean stack and have to use HTTP anyway
because you're on the web then use (RPC over) HTTP \- both can be considered
more "efficient" depending on the setting and your constraints.

But the point was that contrasting web requests with RPC is a mistake of
category and has little to do with various IO handling and concurrency models
that the author was discussing.

~~~
jchw
Well, the thing is, I do agree with the author, though, on their point of not
using web requests for RPCs. I think we must be interpreting the author's text
differently.

~~~
seemslegit
Either that or she's lumping together two separate issues, or both.

------
cwp
Sigh. Yes. I have been there and done that (more or less) and it sucks. The
root problem is that data scientists really want to use Python for machine
learning, but wrapping a Python model in a service that uses CPU and memory
efficiently is really difficult.

Because of the GIL, you can't make predictions at the same time you're
processing network IO, which means that you need multiple processes to respond
to clients quickly and keep the CPU busy. But models use a lot of memory and
so you can't run all THAT many processes.

I actually did get the load-then-fork, copy-on-write thing to work, but
Python's garbage collections cause things to get moved around in memory and
triggers copying and makes the processes gradually consume more and more
memory as the model becomes less and less shared. Ok, so then you can
terminate and re-fork the processes periodically, and avoid OOM errors, but
there's still a lot of memory overhead and CPU usage is pretty low even when
there are lots of clients waiting and...

You know I hear Julia is pretty mature these days and hey didn't Google
release this nifty C++ library for ML and notebooks aren't THAT much easier.
Between the GIL and the complete insanity that is python packaging, I think
it's actually the worst possible language to use for ML.

~~~
crimsonalucard
She's talking about green threads which is different from regular threading in
python. Under nodejs/python style green threads only IO calls are concurrent
to a single computation task. There is no parallelism under both styles of
threading unless you count concurrent IO as parallel.

She is basically complaining about a pattern that was popularized by NodeJS
and emulated in python by older libraries like gevent, twisted and tornado.
Currently python3 uses keywords async/await as an API around the same concepts
implemented in the older libraries.

This has nothing to do with GIL.

~~~
cwp
In the case of the article, you are correct. I have a slightly different case
where I'm wrapping scikit-learn model. We're NOT just calling another service
and waiting for a response, we're doing computation, in Python. So the GIL is
actually a problem.

------
ary
This is spot on. My one and only gripe is with this part:

> So how do you keep this kind of monster running? First, you make sure you
> never allow it to use too much of the CPU, because empirically, it'll mean
> that you're getting distracted too much and are timing out some requests
> while chasing down others. You set your system to "elastically scale up" at
> some pitiful utilization level, like 25-30% of the entire machine.

Letting a Python web service, written in your framework of choice, perform
CPU-bound work is just bad design. A Python web service should essentially be
router for data, controlling authentication/authorization, I/O formatting, and
not much else. CPU intensive tasks should be submitted to a worker queue and
handled out of process. Since this is Python we don't have the luxury of using
threads to perform CPU-bound work (because of the Global Interpreter Lock).

~~~
ghostwriter
> Since this is Python we don't have the luxury of using threads to perform
> CPU-bound work (because of the Global Interpreter Lock).

You can have it with threads if the CPU-bound work is done inside a C
extension - [https://docs.python.org/3/c-api/init.html#releasing-the-
gil-...](https://docs.python.org/3/c-api/init.html#releasing-the-gil-from-
extension-code)

------
cakoose
It seems to be a complaint against doing process-per-CPU.

Let's say your server has 4 CPUs. The conservative option is to limit yourself
to 4 requests at a time. But for most web applications, requests use tiny
bursts of CPU in between longer spans of I/O, so your CPUs will be mostly
idle.

Let's say we want to make better use of our CPUs and accept 40 requests at a
time. Some environments (Java, Go, etc) allow any of the 40 requests to run on
any of the CPUs. A request will have to wait only if 4+ of the 40 requests
currently need to do CPU work.

Some environments (Node, Python, Ruby) allow a process to only use a single
CPU at a time (roughly). You could run 40 processes, but that uses a lot of
memory. The standard alternative is to do process-per-CPU; for this example we
might run 4 processes and give each process 10 concurrent requests.

But now requests will have to wait if more than 1 of the 10 requests in its
process needs to do CPU work. This has a higher probability of happening than
"4+ out of 40". That's why this setup will result in higher latency.

And there's a bunch more to it. For example, it's slightly more expensive (for
cache/NUMA reasons) for a request to switch from one CPU to another, so some
high-performance frameworks intentionally pin requests to CPUs, e.g. Nginx,
Seastar. A "work-stealing" scheduler tries to strike a balance: requests are
pinned to CPUs, but if a CPU is idle it can "steal" a request from another
CPU.

The starvation/timeout problem described in the post is strictly more likely
to happen in process-per-CPU, sure. But for a ton of web app workloads, the
odds of it happening are low, and there are things you can do to improve the
situation.

The post also talks about Gunicorn accepting connections inefficiently and
that should probably be fixed, but that space has very similar tradeoffs
<[https://blog.cloudflare.com/the-sad-state-of-linux-socket-
ba...](https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/>).

------
yowlingcat
I like the author's articles most of the time. While this article contains
some truths, I don't think it argues very persuasively for its conclusion.
Okay, these parts of the Python ecosystem don't work well together, and it's a
bad, unpolished experience. Fair, as with other criticisms of Python.

The question, however, is why one would use gevent at this point in Python's
evolution. There's async await now, and things like FastAPI. If you want to
use, say, the Django ecosystem, use Nginx and uWSGI and be done with it. Maybe
you need to spend some more resources to deploy your Python. Okay. Is that a
problem? Why are you using Python? Is it because it's quick to use and helps
you solve problems faster with its gigantic, mature ecosystem that lets you
focus on your business logic? Then this, while admittedly not great, is going
to be a rounding error. Is it because you began using it in the aforementioned
case and now you're boxed into an expensive corner and you need to figure out
how to scale parts of your presumably useful production architecture serving a
Very Useful Application?

Maybe you need to start splitting up your architecture into separate services,
so that you can use Python for the things that it does well and use some other
technology for the parts that aren't I/O bound and could benefit from that.
But that's not this article is about. This article is about someone making the
wrong choices when better choices existed and then making a categorical
decision against using Python for a service. I'd say that's what "we have to
talk about" if you ask me.

~~~
wrsh07
I've been working on a legacy internal python system that suffers from most of
the complaints here (and in the excellent COST paper Rachel links at the
bottom).

The problems alluded to are, yes, solvable in python. But they also seem
endemic in python systems.

When everyone who uses the tool uses it wrong, maybe it's not the user's
fault.

(That said, I generally do think there's a time and place for python systems
or web apps. That time is generally when speed and maintainability is
significantly less important than flexibility)

~~~
cbsmith
> The problems alluded to are, yes, solvable in python. But they also seem
> endemic in python systems. > > When everyone who uses the tool uses it
> wrong, maybe it's not the user's fault.

Yes, though that doesn't mean it is necessarily the code's fault.

Honestly, I was very confused by this article, because I thought everyone
understood what was going on, the trade-offs involved, and how that ought to
impact your design decisions.

It's not that Gevent'd Gunicorn is intrinsically a bad thing. You're going for
cooperative multi-tasking/concurrency, so no preemptive multi-tasking support.
This creates potential challenges with fair scheduling if you have real-time
constraints like timeouts... so you design accordingly.

One of the advantages of this model is you do indeed need less memory (and
often a little less CPU) to handle high load levels. It's not like you are
intrinsically better off if you use Python in a forking model. You can still
end up so CPU bound that you timeout handling requests... the only difference
is you'll get fairer splitting of the CPU's time across tasks. It can actually
get worse if you get lost in an infinite series of context switches (yes,
there are ways to mitigate this problem... although they can create fair
scheduling problems... it's a natural tension), or worse still, start
swapping.

If the notion that running out of CPU might mean you have timeouts hasn't
occurred to you...

------
j88439h84
Using an ASGI server that supports async/await, such as Uvicorn, instead of
green threads, forking, etc, seems like a good idea these days. Also means you
can use Starlette which has a much nicer design IMO than some of the old
frameworks.

\- [https://www.uvicorn.org/](https://www.uvicorn.org/)

\- [https://www.starlette.io/](https://www.starlette.io/)

~~~
rcarmo
I'm digging
[https://github.com/RobertoPrevato/BlackSheep](https://github.com/RobertoPrevato/BlackSheep)
myself, since I prefer the terseness of Bottle and stacking decorators to add
functionality to handlers.

Any of the above (or Sanic) can do ~3K RPS on a single core on a Raspberry Pi
(which is where I test things for portability, optimisation and a little fun),
and the RAM overhead is generally not that bad small (just did a little "hello
world" uvicorn/blacksheep app and I see 22MB resident/10MB shared per worker,
and one of my Clojure servers taking up over four times that...)

------
worik
This is exactly what I think.

Those below who complain about the complaints are missing the point.

We (computer programmers as a general class) have not learnt from history. We
keep reinventing wheels and each time they are heavier and clunkier.

What we used to do in 40K of scripts now takes two gigabytes in
python/django/whateverthehellelse. E.g. mail list servers. Mailman3 hang your
head in shame!

------
ris
I don't disagree with any of this but

> "Why in the hell would you fork then load, instead of load then fork?"

In python it often seems to make little difference. The continual refcount
incrementing and decrementing sooner or later touches most everything and
causes the copy to happen whether you're mutating an object or not.

I've had some broad thoughts about how one would give cpython the ability to
"turn off" gc and refcounting for some "forever" objects which you know you're
never going to want to free, but it wouldn't be pretty as it would require
segregating these objects into their own arenas to prevent neighbour writes
dirtying the whole page anyway...

~~~
wrmsr
They took a step towards this with
[https://docs.python.org/3/library/gc.html#gc.freeze](https://docs.python.org/3/library/gc.html#gc.freeze)
but it doesn't go as far as disabling refcount touching outright. I've
experimented with doing that, both per-object and just globally, and the
results really were promising if your forkserver can keep up with providing
the necessarily much shorter-lived worker processes.

~~~
ris
Thanks for this link - I had completely missed it (I think I was just
expecting to disable gc entirely or perform some rudimentary surgery on its
linked list)

------
tus88
Gotta love those people who fail to understand how things are supposed to be
used, fail miserably as a result, then throw the baby out with the bathwater
in a fit of tantrum.

Yes, Python has a GIL. Yes, lightweight threads are mostly good for IO bound
tasks. Yes it can still be used effectively if you design your app correctly.

------
ahuang
I think this conflates a poor implementation of a webserver with
python/gunicorn/gevent being bad. There are a few (easy) things to do to avoid
some of the pitfalls she encountered:

> A connection arrives on the socket. Linux runs a pass down the list of
> listeners doing the epoll thing -- all of them! -- and tells every single
> one of them that something's waiting out there. They each wake up, one after
> another, a few nanoseconds apart.

Linux is known to have poor fairness with multiple processes listening to the
same socket. For most setups that require forking a process, you run a local
loadbalancer on box, whether it's haproxy or something else, and have each
process listen on its own port. This not only allows you to ensure fairness by
whatever load balance policy you want, but also lets you have healthchecks,
queueing, etc.

>Meanwhile, that original request is getting old. The request it made has
since received a response, but since there's not been an opportunity to flip
back to it, the new request is still cooking. Eventually, that new request's
computations are done, and it sends back a reply: 200 HTTP/1.1 OK, blah blah
blah.

This can happen whether it's an os threaded design or a userspace green-thread
runtime. If a process is overloaded, clients can and will timeout on the
request. The main difference is in a green-thread runtime it's about
overloading the process vs. utilizing all threads. Can make this better by
using a local load balancer on box and spreading load evenly. It's also best
practice to minimize "blocking" in the application that causes these pauses to
happen.

>That's why they fork-then-load. That's why it takes up so much memory, and
that's why you can't just have a bunch of these stupid things hanging around,
each handling one request at a time and not pulling a "SHINYTHING!" and
ignoring one just because another came in. There's just not enough RAM on the
machine to let you do this. So, num_cpus + 1 it is.

Delayed imports (because of cyclical dependencies) is bad practice. That being
said, forking N processes is standard for languages/runtimes that can only
utilize a single core (python, ruby, javascript, etc.).

This is not to say that this solution is ideal -- just that with a small bit
of work you can improve the scalability/reliability/behavior under load of
these systems by quite a bit.

------
pdonis
The problem being described here isn't Python. gunicorn, or gevent; it's bad
programming. I'd be willing to bet there are systems out there written in C++,
Java, and Ruby that do the same dumb things. The solution is to not do dumb
things--to understand what your program is doing. It's perfectly possible to
do that in Python, gunicorn, and gevent.

~~~
_old_dude_
In the case of Java, the Selector API was introduced in Java 4 (2002) for this
exact reason, avoid to have all the threads to all waits/being notified on
accept().

------
DevKoala
In this crap situation atm, can attest. Currently maintaining a Python app for
the delicate snowflakes whose years of math understanding somehow prevents
them from being able to learn a language that isn’t Python.

We have money, let’s just blow it. /s

------
fancyfredbot
I really like Rachel's blog and I think I understand the point she's making
here. However I think she sees it from the point of view of very large scale
services. In many cases you can have a solution ready more quickly with less
developer time if you use these technologies, and at smaller scale this more
than pays for the additional hardware you need to cope with the inefficiency.
In such cases writing services in python is pragmatic and sensible.

------
doctoboggan
I recently started playing around with Google Cloud Run and am running some
python/flask/gunicorn code in a docker container on the platform.

I noticed in the logs that I am getting a lot of Critical Worker Timeouts and
I am wondering if this has anything to do with it.

~~~
countbayes
We had that problem and hacked around it with the Dockerfile instructions
below, if you find a better solution that would be great :)

\--Dockerfile snippet--

# Cloud Run concurrency is assumed to be set to 10 but we don't assume that is
exact

# See
'[https://github.com/benoitc/gunicorn/issues/1801'](https://github.com/benoitc/gunicorn/issues/1801')
so disabling concurrency

ENV GUNICORN_CMD_ARGS="-c gunicorn_config.py --workers 1 --threads 1 --timeout
120 --preload"

CMD [ "gunicorn", "pkg.http:app" ]

# Or just use Flask directly if concurrency is set to 1

#CMD [ "python", "cmd/server/main.py" ]

~~~
doctoboggan
Thanks for the pointer. I was messing around with --preload and --timeout
flags and they seemed to work, although I think that isn't fixing the root
problem.

------
Matthias247
I’m not sure what the main point of the article is? Telling us that eventloops
have problems? Sure, the lack of preemption can cause latency problems in some
tasks. But native threads have other issues - that’s why people use
eventloops.

Is the message that epoll and co are lot efficient enough? That’s also true.
Api Problems and thundering here are known. And not only limited to Python
applications as users. io completion based models (eg throuh uring) solve some
of the issues.

Or is this mainly about Python and/or Gevent? If yes, then I don’t understand
it, since the described issues can be found in the same way in libuv, node.js,
Rust, Netty, etc

------
Fazel94
I can relate to the writer, working with legacy sucks. This was my main take
on the blog post, others are just brilliant ways of rationalizing why other
people such and why there are other people than me.

Definitely, I am smarter than the guy who wrote this because then I wouldn't
have these problems(Or He is smarter and I just didn't ask him about his
rationale).

What I design wouldn't run into these BS problems that I have to fix, It just
wouldn't run into problems generally. (Or It would have more problems than
this one)

I had these conversations with myself at least a thousand times, and then it
was just the case in the parentheses.

------
nemoniac
So..., What's a good alternative? One that's relatively straightforward to
implement compared to the Python approach.

~~~
jerf
In this _very particular sense_ , almost anything else is better. Dynamic
scripting languages that are intrinsically single threaded because they were
single-threaded for the first 10-15 years of their lives, and it is virtually
impossible to retrofit true threading all the way from their basic runtimes
through all their libraries [1], are basically the pessimal case for this
particular problem.

This is not the whole story of the value of those languages. As the article
even alludes to, at small loads or with lots of care this can be made to
"work". But it is something that an engineer should know about them before
picking them up and using a tool for something it isn't really good at.

[1]: I add this caveat because I don't think there's anything about dynamic
scripting languages that makes them intrinsically difficult to thread any
moreso than any other category of language, it's just that by an accident of
history, they all come to us from the 1990s personal computer world, and they
all spent at least a decade cooking and setting and building libraries and
communities and developer skillsets before a serious need for threading was
even on the horizon.

~~~
samatman
It's a good caveat, because Lua, in particular, has fully-reentrant functions.
You can run a bunch of Lua_states cooperatively or on a threaded basis without
problems. Everything the VM does, from the C side, receives a Lua_state as the
first argument.

It's intrinsically single-threaded, yes. But each instance is quite small and
they stay out of each others way. Add coroutines and there's a lot you can
safely do with Lua that's a real pain to accomplish with Python.

~~~
mhd
In the days of yore, I might've attempted to shoe-horn Tcl in such a
situation. Decent event-loop for distributing tasks and (like Lua, but unlike
Python/Ruby) more eager to escape to C for performance-sensitive tasks.

------
jasonhansel
I really think this should be solved at the OS level. Why is it so hard to
implement kernel threads in an efficient way? Threading shouldn't need to be
done in userspace.

~~~
asveikau
It's not hard. The very start of the article says she trusts proper kernel
threads more. But a bunch of these languages were designed and written in a
time before threads were much of a thing. So they fake threads in user mode
rather than fix the assumptions of the runtime.

Now, using kernel threads uncarefully will lead to different problems, which
are famously tricky... But what I mean is, it is not "omg kernel threads are
hard" that is the primary factor preventing direct access to them. It's that
the language runtime has a lot of baggage.

------
drenginian
Ok so if there’s a problem, what a solution?

If I use uWSGI is problem gone?

------
_bxg1
How does this compare with NodeJS, given the event loop and the managed way
that concurrent tasks happen there?

~~~
crimsonalucard
It's the same thing. She and many other people don't know it but she's
complaining about the same model that was introduced and popularized by
NodeJS.

Current python "green threads" use the keywords async/await as an api.
Underneath this api, the state of the art implementations use libuv (wrapped
in a python library called uvloop) the exact SAME C++ library that powers
nodejs.

------
diebeforei485
I find this post to be unintelligible. Given that it's been upvoted to the top
of HN though, can someone TL;DR of the intellectual value of this post? It
seems to be stepping through the details of what is going on while also being
rambling.

------
MoronInAHurry
Rachel's posts would be so much more useful if she would just say what she
meant, instead of twisting everything into knots to find a way to say it
backwards so she can be sarcastic and condescending while doing it.

I'm sure there's some useful information in here, but it's not worth digging
through the patronization to find it.

~~~
DoreenMichele
My feeling is that if you really wanted the author to improve, you would try
to connect personally, establish trust and then talk to them privately about
ways you feel they could improve their writing. And maybe you aren't
comfortable reaching out privately because it's a woman, so that could go
sideways (assuming you are male, which I don't actually know). Let me assure
you that if you have good intentions, publicly dogging someone because you
aren't comfortable reaching out privately is not a good substitute.

Seeing this comment at the top of the page on the highest ranked post on the
HN front page for the only woman programmer that I am personally aware of who
regularly makes the front page really feels like a kick in the gut and looks
like sexist garbage. And I would like to think better of HN than that.

~~~
nostrademons
This is the Internet, where you take whatever feedback you get with the
appropriate grain of salt and either choose to improve or not. Most people on
the public venues on the Internet - forums, blogs, comments, essays - are not
looking to build relationships or establish trust. (There are some exceptions
- I've made some great friendships with Internet friends - but they're usually
more private niche forums than blogs or other publications with a wide
readership.) They're looking to get their opinion out there, build a
readership, perhaps influence public discourse, and maybe get some feedback on
their ideas.

I've seen similar comments leveled at PG [1] and Zed Shaw [2], so I don't
think it's just sexism.

[1]
[https://idlewords.com/2005/04/dabblers_and_blowhards.htm](https://idlewords.com/2005/04/dabblers_and_blowhards.htm)

[2]
[https://news.ycombinator.com/item?id=9275526](https://news.ycombinator.com/item?id=9275526)

~~~
Misdicorl
Note that both your examples are

a) criticising content rather than delivery/tone

b) not the primary conversation around these two authors

Look to Linus Torvalds for a male example where delivery rather than content
is often the primary conversation. _That_ is how egregious the delivery must
be for a male to get the tone police called on them

~~~
asveikau
People have valid criticisms of Linus's delivery, but the content is often
good. I tend to remember some of the technical arguments in those rants years
after the fact, and cite them.

Keep in mind he did create the Linux kernel and git, so even if he delivers
them inexpertly, even on a bad day, he has some technical insight.

All that said: I agree there is some gender bias showing up on this thread.

~~~
Misdicorl
Oh, of course! If Linus wasn't special nobody would tolerate his style. Women
have to be special for society to tolerate sarcasm from them. I'm unsure how
old I'll be before a woman like Linus will be recognized rather than shoved
aside.

~~~
DoreenMichele
Thank you for your many excellent comments in this discussion. You fought the
good fight. You basically won in that this thread long ago ceased being at the
top of the page.

Take your winnings and go home. Linus is not above social censure. His team
reined him in not hugely long ago and the comment you are hissing at agrees
with your larger point that there's some gender bias happening here and was
uncommonly reasonable and evenhanded. I upvoted it.

I'm trying to be supportive. I'm trying to tell you "You've done enough.
Relax. Take a break. Feel okay about how this went down."

I mean if your mom is dying of cancer or something and screaming at internet
strangers is good distraction from more serious problems, cool. Don't let me
stop you.

But if the point was "Doreen is right: this thread shouldn't be at the top of
the page!" well, it's not anymore. Job well done. Have a cold brew or whatever
and feel okay about it.

~~~
Misdicorl
I'm not sure why my comment is interpreted as hissing/criticism. It was
intended as elaboration and agreement. Oh well, people seem to have not liked
it so I'll reconsider those types of posts in the future

~~~
DoreenMichele
In part because of the larger context. In part because it sounds like sarcasm,
not like you are genuinely agreeing that Linus actually deserves special
treatment because of his stature.

I've defended Linus once or twice. I'm also glad he chose to take some time
off and rethink things.

I can't think of any women we give similar accommodation to. That doesn't mean
they don't exist. But the reality is that Linus is in a league all his own. It
just sounds catty to make comparisons to him in that fashion.

I imagine if we genuinely had a "female Einstein," she would be pretty unique
and would carve out her own unique relationship to the world at large.

I'm sincerely not trying to bust your chops.

~~~
Misdicorl
Interesting. I suppose it can be read that way and I'll try to be more clear
in the future.

My point is that we do have examples of female excellence, but almost
invariably they are not uncouth. It seems more likely to me that the uncouth
ones are silenced than that only male excellence can come in a brusque box

~~~
DoreenMichele
Janet Reno used to refer to herself as _an awkward old maid_ to acknowledge
her lack of smoothness and more or less dismiss such criticisms. Depending on
your age, that might be before your time.

[https://en.m.wikipedia.org/wiki/Janet_Reno](https://en.m.wikipedia.org/wiki/Janet_Reno)

I'm short of sleep. I really don't desire to continue this discussion. I only
spoke up because you seemed really frustrated and I wanted you to feel okay
about how things went and that's apparently not your takeaway at all from my
comment.

------
crimsonalucard
There's a huge amount of technical jargon and sarcasm that makes it hard to
see her point.

Basically she's saying that python async (which the current state of the art
implementation uses libuv the same thing driving nodejs and consequently
suffers from the same "problems") doesn't have actual concurrency.
Computations block and concurrency only happens under a very specific case:
IO. One computation can happen at a time with several IO calls in flight and
context switching can only happen when an IO call in the computation occurs.

She fails to see why this is good:

Python async and nodejs do not need concurrency primitives like locks. You
cannot have a deadlock happen under this model period. (note I'm not talking
about python threading, I'm talking about async/await)

This pattern was designed for simple pipeline programming for webapps where
the webapp just does some minor translations and authentication then offloads
the actual processing to an external computation engine (usually known as a
database). This is where the real processing meat happens but most programmers
just deal with this stuff through an API (usually called SQL). It's good to
not have to deal with locks, mutexes, deadlocks and race conditions in the
webapp. This is a huge benefit in terms of managing complexity which she
completely discounts.

~~~
eropple
_> Python async and nodejs do not need concurrency primitives like locks. You
cannot have a deadlock happen under this model period._

This is dangerously wrong and I would suggest that you reconsider the steps
that got you to this understanding because something really important has been
lost. It is absolutely critical to understand that _deadlocks_ are not why you
have locks. Correctness during concurrent operation is why you have locks.
Deadlocks are a failure state when you do not have correctness during
concurrent operation. So are things like double-increment and double-create.

Parallelism does not imply deadlocking, _concurrency_ implies deadlocking, and
both NodeJS and Python are concurrent runtime environments. And I can
guarantee you that, with a little skull sweat, you can write a deadlock in
NodeJS or Python. It is very easy. If you need some help, here's a trivial
example (and ordinarily I wouldn't use a link shortener here but this one is
hefty, it just goes to the Typescript playground):
[https://bit.ly/2Tvjyze](https://bit.ly/2Tvjyze)

Also, as a concrete, real-world, yes-it-happens-here example of where locking
is important, consider that I've recently built a dependency injection
framework in NodeJS--tried to use others' first, but my situation isn't
covered by existing ones--and had to resort to a mutex to avoid double
creation of objects within a single lifecycle. Creation of objects within this
lifecycle happens asynchronously--it has to, as the act of creating the
objects might itself rely on asynchronous operations. So, if I were to have a
diamond-dependency (A deps B and C, B and C dep D), I will non-
deterministically, and based on the creation time of B and C, create either
one or two instances of D. I rely upon a mutex, keyed upon the dependency
being created, to ensure that this does not happen.

.

I would also submit that perhaps you should adopt a principle of charity and
think real hard about whether your priors are correct before you start talking
about what she "fails" to see. Rachel is one of those people who has Been
Around and while I also have Been Around, I understand that Rachel has Been
Around More and I _probably_ should be listening more than I should be
smarming at her.

Just a thought.

~~~
crimsonalucard
>Parallelism does not imply deadlocking, concurrency implies deadlocking, and
both NodeJS and Python are concurrent runtime environments. And I can
guarantee you that, with a little skull sweat, you can write a deadlock in
NodeJS or Python. It is very easy. If you need some help, here's a trivial
example: [https://codesandbox.io/s/2wxvp](https://codesandbox.io/s/2wxvp)

Ok technically you're right. I am completely wrong when I say it can NEVER
happen.

But let's be real here, you introduced DEADLOCKING deliberately by introducing
LOCKS and by doing context switching at weird places to make it happen. When
nodejs came out one of the selling points was the lack of deadlocks and locks.

Case in point: there are no lock libraries in standard NodeJS.

Think about it, why is a LOCK needed here? Let's say you didn't have locks AT
all. Wherever the heck you are the current Node task technically has what is
equivalent to a LOCK on everything. Why? because all node instructions are
atomic and single threaded. This is what replaces LOCKS in nodejs. Your code
example is just strange. The only place where your example is relevant is if
there was another process.

>But as a concrete example of where locking is important, consider that I've
recently built a dependency injection framework in NodeJS--tried to use
others' first, but my situation isn't covered by existing ones--

Probably because, again, nobody really programs using DP in node let alone
context switching and adding made up locks in the middle of all these
injections and constructions. Whatever you're doing is probably very unique or
(maybe, I don't know your specific situation) a sign over engineering. DP is a
very bad pattern and is one of the primary sources of technical debt in code
(especially when it's over 2 layers deep and in a diamond configuration)...
but that's another topic. Anyway...

What is the point of "acquiring" a lock if in node I have a "LOCK" on
everything? It makes no sense, whatever it is you're doing I am almost
positive that there is a simpler way of doing it. Either way the dependency
chain makes it obvious which needs to be created first and what can be created
concurrently. The below psuedo code should produce what you're looking for
without locks and with equivalent concurrency which is one of the main selling
points of single threaded async.

    
    
      b, c = await runAsync([B, () => C(await D())])
      a = await A(b, c)
    

B is evaluated with D async, C is kickstarted after D, with B still being
evaluated async. All of this blocks until both B and C are complete then A
evaluates. Whatever the heck you're doing with locks, things should happen in
the same order as the dependency chain in both my code and one with locks.
There's really no other order these things can be evaluated. I would even
argue that my code is indeed the canonical way to handle your diamond problem
in node, no lock code needed as expressed by the standard node library.

Think about it, node includes high level functions for http but none for locks
which are an even lower level concept than http. It must mean you aren't
supposed to use locks in Node.

I will say you are technically right in the fact that a deadlock CAN happen. I
was wrong in saying it can NEVER happen, but you have to realize that I have a
point here. Your example is really going very very far out of the way to pull
it off.

>I would also submit that perhaps you should adopt a principle of charity and
think real hard about whether your priors are correct before you start talking
about what she "fails" to see. Rachel is one of those people who has Been
Around and while I also have Been Around, I understand that Rachel has Been
Around More and I probably should be listening more than I should be smarming
at her.

I was not smarming her, whatever smarming means, I am disagreeing with her
just like you are disagreeing with me. There is NOTHING wrong with disagreeing
with anybody. What is wrong is when you are proven wrong and you don't accept
it. I accept that my statement of a deadlock NEVER happening in nodejs is
categorically wrong.

"Being around" does not entitle you to anything. I hate it when people say
this, nothing personal. Do you even know how long I've been around?
Additionally, the overall main point of my post still stands, which you didn't
even really address. I don't think Rachel gets the point of green threads. I
think we can both agree I've made a strong point and maybe you should use your
own charity principles on me.

Just a thought.

~~~
eropple
NodeJS also doesn’t have a function to convert camel-case to PascalCase,
should you not do that too because it’s not in the stdlib?

I’m going to be honest: you have not only not made a good point, you've gone
out of your way to actively ignore that problems around concurrency regularly
require one to use locks even in the absence of parallelism and have since
long before multicore computers, and you're being weirdly hot-under-the-T-
shirt besides.

Quit while you're behind, my dude.

~~~
crimsonalucard
>you're being weirdly hot-under-the-T-shirt

Yeah well your post was rude and condescending. What did you expect with that
attitude? Sure I'm angry, but there's nothing "weird" in my reaction given
your rudeness.

>NodeJS also doesn’t have a function to convert camel-case to PascalCase,
should you not do that too because it’s not in the stdlib?

This is entirely different from a Highly concurrent framework not containing
lock primitives. A critical primitive is missing. It's like a math library
missing the addition operator.

>you've actively ignored that problems around concurrency regularly require
one to use locks even in the absence of parallelism and have since long before
multicore computers

We're not talking about multicore/singlecore stuff. We're talking about NodeJS
and Python Async Await and standard usage patterns.

There are other patterns that need locks but those are typically reserved for
programming things like databases... something that a typical web programmer
who writes NodeJS or Python doesn't deal with as web servers follow a
stateless pattern that considers the usage of global state as bad practice.

~~~
tgv
> We're not talking about multicore/singlecore stuff.

If you write a python or nodejs handler, stateless or not, that does two
subsequent async operations involving changes on shared resources, such as a
database table, you need locks, because another request may come in while the
first is in wait.

Perhaps you try to say that this is irrelevant when you allow only one request
at a time, but that's extremely limited and not the scenario under discussion.

~~~
eropple
This is a really important point. Exporting your locks to Postgres mean
neither that they stop existing nor that you can’t wedge if it’s not written
by clever programmers who better understand concurrency.

~~~
crimsonalucard
Yes it is an important point. Postgresql does not stop deadlocks or race
conditions from happening. You deal with those in Postgresql.

But this isn't the topic of the conversation is it? The topic is locks and
deadlocks in Python asyncio and NodeJS so ultimately irrelevant to your
initial example of the amateur diamond dependency injection where normally no
deadlock should be occuring regardless.

~~~
eropple
[I edited my post before his reply. Sorry for the confusion.]

Exporting your race conditions and washing your hands of them because the lock
mechanism lives on the other end of a network socket rather than in your
process space does not even rise to the level of “mere semantics”.

If you allow two NodeJS fibers to acquire remote locks—Redis redlocks,
whatever—out of order by way of making asynchronous requests to it (noted only
because you have a curious grip on that as being distinctive or meaningful
here), you’ve still deadlocked and it is for all meaningful distinctions a
deadlock _of your processes_ (N >= 1). I state this only for completeness;
there is no magic border at the edge of your process in which no, no, locks do
not happen here. Locks control concurrency. When the problem set requires
them, you use them.

I do not understand the spam of capital letters or the weird aggression. It’s
like arguing about the coefficient of friction. The thing speaks for itself.

~~~
crimsonalucard
>I genuinely do not understand the spam of capital letters or the weird
aggression about a trivial reality.

Then you better get with the program. Talking to people like the way you did
won't make you any friends and will gain you many enemies. Don't worry though,
I'm not that pissed off, just slightly miffed at your attitude. Also I like to
use capital letters for emphasis. I guess you had a problem with that and
decided to make it personal. Just a tip: don't act this way in real life, when
you're older you'll understand.

>If you use Redlock to make a distributed lock over a Redis cluster and you
allow two NodeJS fibers to acquire resources locks out of order by way of
making asynchronous requests to it (noted only because you have a curious grip
on that as being distinctive or meaningful here), you’ve still deadlocked and
it is for all meaningful distinctions a deadlock of your processes (N >= 1).

Yeah because you're replacing your boolean in the earlier example with an
isomorphic value. Either use a global js variable or a global value from
redis. Same story. Nothing has changed from the locks you invented earlier.

Let me repeat my point. You shouldn't ever need to do the above in NodeJS
because the area where the asyncio in python and NodeJS operate in are
stateless web applications. That's why NodeJS doesn't have locks. You have to
go out of your way to make it happen.

>There is no magic border at the edge of your process in which no, no, locks
do not happen here. Locks control concurrency. When the problem set requires
them, you use them.

And your point is? I don't understand your point. Clearly nothing I said was
to the contrary.

Let's say your problem set is writing a database. Then locks makes sense. Does
NodeJS make sense for this problem set? No. Do Locks make sense for NodeJS?
No. Global mutable state is offloaded to external services and that is where
the locks live. This is the trivial reality.

Let's stay on topic with reality. In what universe does your diamond
dependency need a dependency injection framework with locks in nodejs? If you
need locks your fibers are sharing global state and you've built it wrong.

    
    
      b, c = await runAsync([B, () => C(await D())])
      a = await A(b, c)

~~~
eropple
That you conflate “async IO” and _promises_ may be why you’re in this hole in
the first place.

Async IO uses promises to abstract out it’s select (or moral equivalent) but
promises are not async IO. A weirdly prescriptive attempt at dictating what
these are “for” don't do much to obscure the thing—they speak for themselves.

I’m still really confused why a callback-hell topographical sort and process
would be somehow better than a cache, a lock, and a breadth-first search—not
least because it’s easier to follow and is also, anecdotally, _faster_ —but
clearly these mysteries are just plain beyond my pay grade.

~~~
crimsonalucard
>That you conflate “async IO” and promises may be why you’re in this hole in
the first place.

It's all just single threaded cooperative concurrency with context switching
at IO. The isomorphic apis on top of this whether it's callbacks, async/await
or promises is irrelevant to the topic at hand.

>I’m still really confused why a callback-hell topographical sort and process
would be somehow better than a cache, a lock, and a breadth-first search—not
least because it’s easier to follow and is also, anecdotally, faster—but
clearly these mysteries are just plain beyond my pay grade.

I'm confused as to what the hell you're talking about. "callback-hell
topographical sort and process" Wtf is that? Where were callbacks used in my
example? Where was sort used?

Do you not understand that the dependencies determine the order of
construction? That's it, it doesn't matter what technique you use the overall
steps are the same. There is no bfs or callback hell going on. You manually
instantiate the dependencies and choose what's async and what is sync. No need
for locks.

Are you talking about something that takes a dependency graph and constructs
the instance from that? If you want to do that your algorithm is incorrect.
You need Post Order DFS, BFS won't work, but both BFS and DFS are O(N) so in
terms of traversal over dependencies it's all the same.

    
    
      class Node:
         def __init__(self, createAnObject: AwaitableFunction[Any...], dependencies: List[Node])
            self.deps = dependencies
            self.constructor = createAnObject
    
    
      async def constructObjectFromDependencyTree(root: Node) -> Any:
         if root is None:
            return None
         else:
            instantiatedDeps = await runAsync([constructObjectFromDependencyTree(node) for node in root.dependencies])
            return await root.constructor(*[i for i in instantiatedDeps if i is not None])
    

The algorithm is bounded by O(N) where N is the amount of total dependencies.
If you want to construct an object with a total of N dependencies then no
matter how you do it, the operation will ALSO be bounded by O(N). In terms of
speed, it's all the same, but the above is how you're suppose to do it.

The above algorithm should give you what you want while providing concurrency
and sequential execution exactly where needed. No callback hell, no promises,
no sorting, no external shared state and no locks.

Regardless, if you're building Objects that necessitate such algorithms you
are creating technical debt by creating things with long chains of
dependencies. You should not be using your primitives to create large
dependency trees; instead you should be composing your primitives into
pipelines.

Additionally, relegating so much complexity to runtime is a code smell. If
there aren't too many permutations bring it down to a manual construction with
your code rather than an algorithm/framework.

------
airstrike
At this point, I imagine there's likely a similar law to Betteridge's that
states:

 _Any headline that starts with "we have to talk about" can be answered by the
words "do we?"_

------
mesozoic
As much as I love Python I still tell people don't use it for performance
sensitive applications.

~~~
pdonis
It depends on what kind of performance you need. For CPU intensive tasks I
would agree with you. But for network I/O intensive tasks, even though Python
is slow it's still more than fast enough to keep up with a large request
volume since network I/O latency is so much longer than CPU/memory latency.

~~~
dmurray
And for data science /ML things it works great these days, even though those
applications are 100% performance-bound.

------
viraptor
> It's around this time that you discover that people have been doing naughty,
> nasty things, like causing work to occur at "import time".

Is this something people actually have problems with in practice? I did lots
of python and ran into it once. It was quickly fixed after a raised issue. I
feel like non-toy development just doesn't experience it.

But maybe that's my environment bubble only. Do people who do serious python
development actually have problem with this?

~~~
tgbugs
Python pretends to be a nice homogeneous "everything is at run time" language,
but it is all a big lie and there aren't big flashing letters saying "you
really probably shouldn't do this" when you start solving a problem in a
certain way. For example, it is almost certainly best practice to _never_ call
a function, class method, or static method inside a module that is going to be
imported, and certainly never instantiate a class. However, there are certain
patterns that almost necessitate it if you don't want to write loads of boiler
plate or deal with the performance overhead of metaclasses. There are also a
bunch of nice hacks like using `object()` at the top level as an instance
distinct from everything else, but I'm sure there is a way that `MYTYPE =
object()` will come back to absolutely ruin your day if you have to compare
two `MYTYPE` instances in two different dicts derived from a parent process
and a subprocess.

I have personally made this mistake on two or three occasions where I
conflated file/module behavior with class behavior because I wanted a python
file to act like it was a bit more declarative. Unfortunately this leads to a
world of eternal pain. You can work around it, but you should have made
everything a python class and pretended like the files/modules don't exist or
at least have staggeringly different semantics hiding behind that innocent
little `.` operator. Python simply cannot support the desire to solve a
problem in a certain way because of the structure of the problem and forces
you into using its happy path patterns if you want it to work in slightly
different run time contexts.

Two relevant posts from instagram engineering on the subject which suggest
that best practices for avoiding these kinds of issues are non-obvious and
easy to miss.

[https://news.ycombinator.com/item?id=20708889](https://news.ycombinator.com/item?id=20708889)
[https://news.ycombinator.com/item?id=21284669](https://news.ycombinator.com/item?id=21284669)

~~~
viraptor
> but I'm sure there is a way that `MYTYPE = object()` will come back to
> absolutely ruin your day if you have to compare two `MYTYPE` instances in
> two different dicts derived from a parent process and a subprocess.

I don't see how this can be an issue with imports. You have two cases: you
imported the module with MTYPE before or after fork. Before: they will compare
fine. (unless IPC's involved) After: you transferred the dict with MTYPE
through some kind of serialisation or shmem and you cannot compare identities
- that's a property of IPC rather than something to do with python modules.

Anyway, what I meant in previous comment was the risk matrix view. Sure, this
can lead to bugs scoring various points in severity, but does it score high on
likelihood?

------
dirtydroog
Somewhat related to the RPC argument, but HTTP is a total joke, adn therefore
so is REST.

In adtech you send 204 responses a lot. The body is empty, just the headers.
Headers like 'Server' and 'Date'. Apache won't let you turn Server off...
'security through obscurity' or some nonsense. Why do I need to tell an
upstream server my time 50k times per second?

Zip it all up! Nope, that only applies to the body which is already empty.

Egressing traffic! A cloud provider's dream. I wonder what percentage of their
revenue come from clients sending the Date header.

~~~
CameronNemo
There are zero response headers required for a 204 status.

See:
[https://tools.ietf.org/html/rfc2616#section-10.2.5](https://tools.ietf.org/html/rfc2616#section-10.2.5)

Seems like what you are having trouble with is Apache, not HTTP 1.1.

