The problem that's described here - "green" threads being CPU bound for too long and causing other requests to time out is one that is common to anything that uses an event loop and is not unique to gevent. node.js also suffers from this.
Rachel says that a thread is only ever doing one thing at a time - it is handling one request, not many. But that's only true when you do CPU bound work. There is no way to write code using blocking IO-style code without using some form of event loop (gevent, async/await). You cannot spin up 100K native threads to handle 100K requests that are IO bound (which is very common in a microservice architecture, since requests will very quickly block on requests to other services). Or well, you can, but the native thread context switch overhead is very quickly going to grind the machine to a halt as you grow.
I'm a big fan of gevent, and while it does have these shortcomings - they are there because it's all on top of Python, a language which started out with the classic async model (native threads), rather than this model.
Golang, on the other hand, doesn't suffer from them as it was designed from the get-go with this threading model in mind. So it allows you to write blocking style code and get the benefits of an event loop (you never have to think about whether you need to await this operation). And on the other hand, goroutines can be preempted if they spend too long doing CPU work, just like normal threads.
> You cannot spin up 100K native threads to handle 100K requests that are IO bound (which is very common in a microservice architecture, since requests will very quickly block on requests to other services). Or well, you can, but the native thread context switch overhead is very quickly going to grind the machine to a halt as you grow.
Why? Bigger MySQL servers regularly use 10k native pthreads which spend most of their time waiting on IO.
Context switch overhead isn’t linear with the number of threads! The Linux scheduler runs O(logN). Moving from 10k MySQL threads to 1 thread per core will give you less than 10% extra throughput.
Doing IO means context switching. Goroutines don’t get a free pass here.
There are other costs to regular context switching as opposed to goroutines/greenlets (the green threads that gevent uses). I don't remember the details but specific attention was paid to the point of making context switching and other resource consumption by these green threads cheaper than native threads, so I suggest reading about it in greenlet/Golang docs :) You can also try searching for C10K which was the term people used to discuss how to achieve 10K connections, and is often associated with cooperative threading.
For example, the cost of the context switch itself (storing all registers) is more significant with native threads.
Just try spinning up 100K threads that each print a line and then sleep for 10ms, and see how high your CPU usage gets.
Also, doing IO does not necessarily mean context switching - it means calling into the kernel (system calls). If you use an async IO operation (read/write from a socket) and then continue to the next thread, by the time you're done with all ready threads, you're likely to have some sockets ready to read from again, so you might not context switch at all. Kernel developers are working on even reducing the need for syscalls with io_uring, which is designed to allow you to perform IO without system calls.
Green threads are much cheaper to switch than pthreads, yes. In real applications the difference is far smaller than it was 20 years ago when C10k was challenging. In 2020 you can just open 10k threads and forget about it.
With 100k threads and 100k Goroutines, each doing nothing but waiting on a mutex: pthreads in C takes ~20 microseconds per thread and in Go it’s about ~5 microseconds using Goroutines.
This difference disappears really easily. Parse some JSON and it’ll be gone.
Entering kernel code is the expensive part of context switching so syscalls are very nearly as expensive. Reading from a socket still needs a syscall, even with green threads or asynchronous IO.
The more different bits of IO you do, like in a real web app, the less advantage their is to green threads. This is one reason Rust dropped their M:N threading implementation.
There shouldn't be any 100k hard limit for threads at least in Linux, though you need enough memory for 100k stacks of course. You need to increase some default limits for it though (https://stackoverflow.com/a/26190804)
Assuming a generous(?) 20 kB per thread in stack and other corresponding OS bookkeeping inforation you could have 1k threads in 20 MB, or 1M threads in 20 GB.
Doing 100 Hz timer wakeups and IOs concurrently in 100k threads makes 10 M wakeups/second, that takes a chunk of CPU independent of green / native threads choice. Performance vs kernel threads will depend on the green threads implementation.
It's worth noting that the c10k writeup came out 20+ years ago, and those bottlenecks have been addressed both by fixing software bottlenecks and 20 years of semiconductor improvements.
To expand on this, by allowing preemption on function calls and simply not providing a loop mechanism, you can guarantee an upper bound on how long a process may need to wait until it can be scheduled.
Guarantee is a strong term. If you have an infinite number of processes to running, there aren't guarantees on how long until it is scheduled.
Similarly, with cooperative multi-tasking as in Gevent, you can manipulate scheduling to try to provide better guarantees about wait times. It's just... you can't ignore the problem.
> You cannot spin up 100K native > threads to handle 100K requests that are IO bound (which is very common in a microservice architecture, since requests will very quickly block on requests to other services)
The Varnish server does that all the time and it is not a problem.
Modern OS handle a large number of threads much better than 10 years ago.
In case of python however, you can not do that due to the GIL and the fact python handle native threading terribly.
> We rarely recommend running with more than 5000 threads. If you seem to need more than 5000 threads, it’s very likely that there is something not quite right about your setup, and you should investigate elsewhere before you increase the maximum value.
> We rarely recommend running with more than 5000 threads. If you seem to need more than 5000 threads, it’s very likely that there is something not quite right about your setup, and you should investigate elsewhere before you increase the maximum value.
Yes, and still 5000 will be more than enough to serve most of production load. Even on high traffic websites.
In NodeJS and python "async/awaits" you don't have to deal with deadlocks and mutexes and other ugly concurrency primitives/problems. The reduction in complexity of the program is huge which is something rachel completely discounts.
I'm not sure discounts. I mostly do firmware in that environment you can use either preemptive or cooperative multitasking. That latter is much less likely to result in corrupted state and results in smaller code size. Which is why I like it. But you pay with latency and task starvation problems as Rachel complains of.
So in another reply some guy pulled off a deadlock in node by creating his own external lock primitive outside of the node stdlib which in itself doesn't contain a single lock primitive. Technically, I'm wrong, it can happen, but only if you go out of your way to do it:
Yes, but you shouldn't just do that every time as calling into the event loop has its own processing cost. If you do that on each iteration of the loop, it will probably become much slower.
At a previous job, I wrote a wrapper that would compare the last execution time of the event loop scheduler, and if it was larger than some value, it would yield (by sleeping for 0ms). This was intended for use in loops with varying length, which might be long and could be short depending on some external factors, but yielding on each loop iteration made them slower by an order of magnitude.
IIRC It was sending messages to RabbitMQ, which somewhere down the line is writing to a socket, but not necessarily blocking - if the socket manages to flush its buffer fast enough (faster than the processing code can send messages), writing to it may never block, resulting in a CPU bound loop (since the loop may perform work other than sending).
We didn't want people to have to think too hard about how their loop was going to behave (especially as it might mean reasoning about the internals of a third party library), and so the wrapper was born. If your loop was short enough, all it did was compare ints so the cost was negligible.
Doesn't that mean you have to somehow know when it's been running too long and then yield back control somehow? What if the slow operation is something that's atomic, like multiplying some huge numbers?
Well the example given is decoding JSON. If that is happening in a long loop, you can yield once per loop and be safe. Not all problems are neatly broken apart like that, but in those cases how much of a chance does the server have to not timeout regardless, you know?
Note that once per loop might be too often, but you can just measure how long a loop run typically takes and compare to how soon you want to preempt the task, then yield at the right interval.
Seems like abstractions will bite you there, most will just do cool_library.unmarshall(request) in some variant, those libraries will not have the same method for yielding as you have.
Abstractions are meant to be broken! One could probably work around this problem by adding new functions to cool_library, or modifying existing ones, whose code would be copy-pasted from the library but with some asyncio.sleep(0) calls spliced in strategic places :). For legacy projects, it may make more sense to cheat like this than to rewrite the whole project in a saner tech stack.
Before webworkers this was how you did things in the browser to not get the "page is not responding" popup on computationally expensive operations. Break each big operation into many small operations and step to the next phase using setTimeout(1,..)
> The problem that's described here - "green" threads being CPU bound for too long and causing other requests to time out is one that is common to anything that uses an event loop and is not unique to gevent. node.js also suffers from this.
I don't know, but I was actually discussing time when the thread is performing pure CPU operations and not doing any IO. In this case the JS code does not yield to the event loop, so any other requests waiting will stall.
This is actually an important property of JavaScript that allows it to work without locks (as long as you do not await, your code runs completely synchronously and will not be interrupted). It's not a flaw, but definitely requires awareness.
Gevent code that doesn't spawn more than one native thread (the one that runs the event loop) also has this property - you don't need locks as long as you do not perform IO. In Python's case it can be more tricky, as you might end up yielding to the event loop by indirectly performing IO when logging or something similar.
In JavaScript, the only case, AFAIK, where this can happen is when you await. Nothing else will cause you to yield to the event loop.
The JS event loop is kind of funny, because tasks have no priority and no preemption and so they're executed first-in first-out. If you want to avoid blocking other code paths for more than a certain length of time you need to change the order you push tasks in[0] and you need to limit execution time in some way, which can get pretty complex to reason about.
[0] by "push tasks" I mean using setTimeout, new Promise(), await, requestAnimationFrame, or similar
Ok, so it is opposite in nodejs? There's a single JS thread but as soon as it performs any async IO ( which is the only IO that one should be performing in node ), that IO creates a thread servicing that IO request. In addition to that, if one is crazy enough to run long computational operations there's setTimeout(), setImmediate() and process.nextTick() and friends.
Though I would say it doing any real compute work inline rather than farming it out to workers is a boneheaded idea regardless of the language used.
AFAIK there is only ever just one thread (that runs user code). setTimeout, etc are all also invoked by the event loop.
I can vaguely recall (but it's 1AM so I didn't verify - take with a grain of salt) that nodejs uses an a thread pool for IO, but I guess that's only for IO for which there is no async API, otherwise that would be wasteful.
I imagine a quick search about the event loop and said thread pool would yield better researched answers :)
Nodejs is single process, single thread and single core. Chrome used to run different tabs as threads of a process but that ended since Spectre and Meltdown. Back to single process since then.
async IO creates a thread servicing that IO request
Nope. The event is a single FIFO queue implemented with libuv. It differentiates between sync and async function and basically goes around the queue running synchronous functions to completion every time it find one and async ones for a set time before switching to next one.
There are Linux system calls in wide use that are blocking. Both node.js and Golang use native threads to make the system calls and "park and wait" for the response. The runtimes then take the information from the call and incorporate it back into the user code threads(the one user thread in node.js).
The implementation of libuv under nodejs (and the state of the art python async/await) actually uses a total of three threads. It's a complicated model and it's not even strictly correct to call it an "event loop." I don't completely understand it and very few people do.
From a high level you can think of it as a single thread that can only context switch during an IO call. Otherwise the function has to run and block all the way to termination. <---- This is what rachel is complaining about.
As long as my 100k threads are all waiting on io the kernel doesn't need to switch to them.
I get that once you reach full utelization, the best the kernel can do is to make it fair, and that's going to cost. That's not a general statement like "100k threads grinds the machine to a halt".
I have worked on a system that spins up 50k threads on a CPU with 72 threads and it works just fine, it's the google web page fetching system. If that didn't work then Google wouldn't fetch web pages. At all.
> Go back and look. I said that it forks and then it imports your app. Your app (which is almost certainly the bulk of the code inside the Python interpreter) is not actually in memory when the fork happens. It gets loaded AFTER that point.
You can just pass `--preload` to have gunicorn load the application once. If you're using a standard framework like Django or Flask and not doing anything obviously insane then this works really well and without much effort. Yeah I'm sure some dumb libraries do some dumb things, but that's on them, and you for using those libraries. Same as any language.
If you want to stick your nose up at Python and state outright "I will not write a service in it" then that's up to you, it just comes across as your loss rather than a damning condemnation of the language and it's ecosystem from Rachel By The Bay, an all-knowing and experienced higher power. I guess everyone else will keep quickly shipping value to customers with it while you worry about five processes waking up from a system call at once or an extra 150mb of memory usage.
> The code objects and other things that are immutable. (CPython refcounts those too?)
CPython refcounts all objects. Refcounting is not required because of mutability; it's required because the interpreter needs to know when an object's memory can be reclaimed for something else.
I don't know if code objects specifically would have their refcounts mutated a lot, since typically they're only referenced by one object, the function that they're the code for. But function objects will have their refcounts mutated every time the function is called, since that sets up a stack frame that grabs a reference to the function object and then releases it when the function returns.
> If you're using a standard framework like Django or Flask then this works really well and without much effort.
I dug into it ~year ago, Django loads almost everything lazily, so simple --preload did next to nothing. I had to write code to load app for real at import time, exact thing article and common wisdom tells us not to do.
Django loads some things lazily but not almost everything. Nearly all your imports should be loaded by the preloaded application and shared across forks (COW semantics aside), and this usually takes up a non-trivial amount of memory. The things that are lazy are usually lazy for a reason - database connections, caching etc.
I believe the i18n system is also lazily loaded and depending on the languages you configure it can take up a fair bit of memory.
The whole discipline Rachel writes about is clearly intended for mature, scaled operations where outages and inefficiencies are legitimately worth much more than the systems wizards to stop them. There’s a time and a place for “move fast and break things” and if that’s where you are, it’s probably not for you.
I don’t think you guys are saying different thing here.
The article in this case is describing a bunch of common processes/optimisations/features that we have learnt to be critical for effective and efficient running of software. The author does this because the audience she writes for is, as the previous comment puts it “mature, scaled operations where outages...” etc etc
> You can just pass `--preload` to have gunicorn load the application once. If you're using a standard framework like Django or Flask and not doing anything obviously insane then this works really well and without much effort. Yeah I'm sure some dumb libraries do some dumb things, but that's on them, and you for using those libraries.
It's not always trivial to ensure none of your dependencies have import-time side effects. Sometimes the productivity/business benefit provided by the depedendency outweighs the pain introduced by the side effects.
If spinning up a few more workers will solve a performance problem for you, it’s probably worth the time to throw the preload flag on there and see what it does to your test suite. Since you are already cost optimizing at this point you probably have the time.
> everyone else will keep quickly shipping things with it while you worry about five processes waking up from a system call at once or an extra 150mb of memory usage
With the current state of ecosystems, this quality-vs-quantity mutual exclusivity is much less pronounced. These days, you can fire up these services as quick or quicker than in Python with better performance and resource usage that is also more maintainable. Unless you speak of highly ecosystem-dependant libraries (e.g. ML), Python defenses that rely on time to market say more about the author's narrow comfort than general expediency.
Of course weird domain specific libraries are a major reason to use python. I can get my oddball service up in a few days vs weeks or months to replicate some bizarre library.
If all you're doing is basic web stuff then sure you can do it in Lang du jour but even after a decade go doesn't have a competent xml library for example.
Time to market is killer. Python buys you time to determine whether your product is even worth building. Deployment sucks though.
e: I literally could not have written the services in my current role in a language other than Java or Python without replicating 100kloc libraries. Java would have required a bunch of work to integrate with the other services we had. So: python. If that costs me an extra $1k/mo for servers but gets us a customer paying $100k a month, was it wasteful?
Why was it easier to integrate with those other services from Python than from Java? Only because you already picked Python for those, right?
(I agree with you about time to market being the important thing. I don't think Python is winning that game any more though: its dependency management and deployment has fallen behind the rest of the industry, and newer languages have largely caught up with its conciseness without having to make the same compromises)
Not really. Setting up a scalable Java service is complicated even with tooling like spring boot, and Java has better libraries for our use case, but the feedback cycle and general code velocity was much slower since we'd be on java 8 among other things. Plus learning spring or a Java ee framework is tantamount to learning a whole new language it wasn't worth the time then.
> Setting up a scalable Java service is complicated even with tooling like spring boot
How so? You can use much the same techniques you would in Python, or you can deploy a .war to a bunch of application servers and achieve what you'd do with docker/kubernetes/etc. in a much simpler way. You've also got a much better chance of scaling up with a single instance and not needing to scale horizontally.
> the feedback cycle and general code velocity was much slower since we'd be on java 8 among other things
What's keeping you on Java 8? Major JVM version upgrades are much easier and safer than even minor Python upgrades. I'm not doubting your situation, but old version of one language versus new version of another is not really a fair basis for comparison.
> Plus learning spring or a Java ee framework is tantamount to learning a whole new language it wasn't worth the time then.
Sure. I'm not saying it's wrong to choose to stick with the technology you're currently using - there's definitely a cost to switching or learning something new. But it's worth being conscious of whether your technology choices are being driven by legacy constraints and whether you'd want to make a different choice on a green-field project.
1. "Deploying a war to an application server" is a giant pain in the ass when you don't have people deeply familiar with EE app servers and tuning them. Python is not the best choice here but little can beat go's scp deploy
2. Tell that to every library that relied on java EE being shipped with the JDK. A lot of the reason to use Java was these libraries that are, if you're lucky, in maintenance mode. They're still stuck on 8 since it's not worth putting the time to migrate them to 11+ (I don't have that luxury, unfortunately)
3. > technology choices are being driven by legacy constraints
This project was greenfield but the problem domain is plagued by legacy constraints (all the way back to mainframes). "Rebuild from scratch" in a nicer, newer language would take years compared to a runway measured in months, so you do the math.
> Python is not the best choice here but little can beat go's scp deploy
Java is pretty damn close to that if you take the route of building a shaded jar (and embedding jetty if you need a web server). You need to install the JVM on the target server but that's all.
> Tell that to every library that relied on java EE being shipped with the JDK. A lot of the reason to use Java was these libraries that are, if you're lucky, in maintenance mode.
I don't think I ever saw a library like that? There's a huge, high-quality ecosystem of open-source Java libraries and I've never heard of any of them being reliant on Java EE.
> This project was greenfield but the problem domain is plagued by legacy constraints (all the way back to mainframes). "Rebuild from scratch" in a nicer, newer language would take years compared to a runway measured in months, so you do the math.
Sure, but you can't say this project is an example of when Python is a good technology choice if really the main reason you were using Python was because of legacy constraints.
Citation needed. You can write anything you want in any language you want, but if your team is experienced with Python then they will continue to ship value quickly. Sure, maybe if they were all well versed in brainfuck they could ship things quicker.
Narrowing in on the rather specific point about shipping things with Python and ignoring the larger argument that it doesn't freaking matter if things are not as efficient as they could possibly be is quite odd to be honest. I'm sure some of the arguments in the blog post would apply to whatever language you had in mind while writing your reply.
Yes after reading through the article it's not very clear to me what the actual problem is with using Python/Gunicorn/Gevent.
The author seems to be saying something about how if a worker is busy doing CPU intensive work (is decoding JSON really that intensive?) then other requests accepted by that worker have to wait for that work to complete before they can respond, and the client might timeout while waiting?
If that's the case:
1. Wouldn't this affect any language/framework that uses a cooperative concurrency model, including node.js and ASP.NET or even Python's async/await based frameworks? How is this problem specific to Python/Gunicorn/Gevent?
2. What would be a better alternative? The author says something about using actual OS-level threads but I thought the whole point of green threads was that they are cheaper than thread switching?
1. Yes, it would affect other things. This is just an illustrative example.
2. Green threads have lower overhead, but it's a false economy if it causes you to do needlessly redo work because of timeouts that could have been avoided.
Which it seems it must because the kernel doesn't have the insight to know whether a green thread is doing that epoll because it's ACTUALLY idle or because it's not but it's willing to try to juggle a second (or third) thing while it has something on the back burner. So the kernel indiscriminately assigns work to threads without regard for whether they are juggling a lot or nothing.
Whereas with native threads, they never ask the kernel for more work while they're blocked on something else because they are literally blocked and thus won't be making that epoll system call.
(The article also mentions something about LIFO policy, which exacerbates the problem because it favors assigning work to the process which is likely to already have most of it.)
How come there's no work stealing? Green threads are supposed to be backed by some N:M thread pool, no?
Also, isn't the problem that JSON decoding (or whatever computation) simply block the thread and the other green threads cannot proceed at all, because there are simply no safepoints (yield points) inside these low level functions?
And in all these cases shouldn't the application estimate work (eg. in case of JSON if the string is longer than 100K), and if it's too big just put it on a dedicated heavy compute N:N thread pool?
For Python it's best practice anyway because the GIL, no?
Python was always more about breadth than depth. (CPython is full of known inefficiencies, but it's with us since 1989, and basically the core dev who worked most on performance - Victor Stinner - thinks the best way is to introduce subinterpreters - https://github.com/vstinner/talks/blob/master/2019-EuroPytho... )
Oh, that PDF is interesting, Python 3.8 has shared memory for multiprocessing, no more pipe objects between processes.
Furthermore extension and internal stuff always had the ability to release the GIL and do its own thing (for example, on a threadpool, or using async/nonblocking I/O). But I have no idea about Gevent. I never liked it. (Just as Twisted/Tornado it was too much magic for too little benefit.)
I don’t know what gevent is doing. But if you have a global interpreter lock then M:N might not be worth it, since only thread will make progress (outside of the syscalls, which are non blocking).
Surprised nobody mentioned asyncio.run_in_executor yet. It's designed to offload the event loop from long running cpu bound tasks, by moving them to another thread pool (or process pool if you are afraid of GIL). Eventually that thread pool will obviously also get starved given enough load but at least you wont have CPU blocking IO and vice versa. Tricky thing is knowing when an operation might grow to become too slow for io-thread given dynamic inputs.
that's because `run_in_executor` doesn't spread CPU usage. All it does is wrap functions in threads so you can call them async. It doesn't create multiple processes so you're still limited to a single core in Python.
Node.js will have that issue and in fact the stdlib JSON encoding/decoding can't even be paused so once you start a processing something you're stuck until it's done. You could, however, write an incremental serializer/deserializer that could spread processing out across many event loop cycles to mitigate.
Go, ASP.NET, and others not so much, depending, because the schedulers can pause and resume the tasks(on top of being threaded).
> 1. Wouldn't this affect any language/framework that uses a cooperative concurrency model, including node.js and ASP.NET or even Python's async/await based frameworks? How is this problem specific to Python/Gunicorn/Gevent?
I think she's against anything that has that problem. Not every green thread implementation has that problem. For example Go doesn't have that problem. Because there were 4 CPU threads (I think) and only 2 things needing to be done, with Go's M:N scheduling those 2 things would be sure to both be running.
> The author seems to be saying something about how if a worker is busy doing CPU intensive work (is decoding JSON really that intensive?) then other requests accepted by that worker have to wait for that work to complete before they can respond, and the client might timeout while waiting?
Yes. Decoding JSON with Python is CPU-intensive.
This a very simple shell script around Python, that is designed from the get-go to crash with an exception. However, it may not be the exception you expect:
Python's docs suggest you should arrive at one of two errors here: MemoryError (We are trying to parse something sizeable) or json.DecodeError (The JSON is invalid).
You won't. You'll hit RecursionError.
Because, despite how badly Python deals with recursion, the JSON library depends on it extensively. Which means that there is a huge stack-tree being built every time you try and decode JSON in Python (Dictionaries, function call overhead, etc, etc.), only for it to be thrown away.
in Python, everything is generally CPU intensive compared to what it would be in compiled languages, even though things like JSON decoding are usually happening in a C library, Python programs that do close to nothing still use way more CPU than you would if you were running in the JVM, or Go, C, whatever.
> Wouldn't this affect any language/framework that uses a cooperative concurrency model, including node.js and ASP.NET or even Python's async/await based frameworks? How is this problem specific to Python/Gunicorn/Gevent?
CPU bound-ness affects all of these platforms, yes. It affects Python and other intepreted languages the most however because these platforms get the most CPU-bound the most quickly. Also, applications that are written in scripting languages tend to have a lot of business logic going on in the first place; after all, if you just wanted to serve static pages you could use Apache with the Event NPM and if you wanted to proxy HTTP requests you'd use HAProxy; both event-based systems that are very much not CPU bound.
But yes, most importantly, Python's asyncio system is completely impacted by these same issues and I would have preferred she address that, as asyncio is part of the standard library now and is way more popular than gevent.
> What would be a better alternative? The author says something about using actual OS-level threads but I thought the whole point of green threads was that they are cheaper than thread switching?
I will grant she lost me a bit with the "use a real RPC system with <feature> <feature> <feature>" thing, and additionally the "load the application in the child process" thing is pretty typical, a worker process should obviously have either threads or greenthreads in use so that each process can handle multiple concurrent requests, but only as many as you'd want handled effectively by one core since the GIL is going to enforce that (another thing you wouldn't have to deal with in other languages such as the above mentioned compiled languages), but it's typical that child processes are going to have a mostly original copy of things.
But as far as the "context switching" thing, I've yet to see benchmarks that show the overhead of OS-level context switching actually being more of a performance burden than the less frequent, but more work intensive context switching that user-space schemes like asyncio have to use. If you are writing a logic-heavy, or even a logic-just-a-bit service that receives Python requests you will also have to worry about CPU-bound issues all the time. Using regular threads with processes, like what you get using something like mod_wsgi, will allow individual processes to attend to web requests more evenly. With mod_wsgi you can configure worker daemons that run multiple OS level threads and you can also have multiple daemon processes.
I'm not sure if the multi-process model used by mod_wsgi has solved the accept() problem, however in my experience the bigger problem is when a service configures itself to allow for 1000 greenlets within each process, while each process is realistically capable from a CPU perspective of handling maybe 5 or 10 concurrent requests, there's no mechanism that ensures that each process gets an even balance of requests. That is, you might have all your requests waiting in one process, because you told them it can process 1000 at a time, while other processes are idle.
TL;DR I'm in the "event based programming is extremely overrated in Python" camp.
> Python programs that do close to nothing still use way more CPU than you would if you were running in the JVM, or Go, C, whatever.
Yeah but that's not exactly shocking news to anyone, is it? People generally choose Python for other reasons (productivity, library ecosystem, etc.) because the performance is "good enough" for most web apps, and if you reach a traffic level where performance becomes an issue that's a good problem that you can optimize for later. (Like Facebook did with PHP, Twitter did with Ruby, etc.)
> But yes, most importantly, Python's asyncio system is completely impacted by these same issues and I would have preferred she address that, as asyncio is part of the standard library now and is way more popular than gevent.
Right, the blog post gave me the impression she was calling out the combination of Python/Gunicorn/Gevent specifically for some reason. But if the underlying goal was to just point out that Python is slow then I am curious what people think the right solution is? Just switch out of Python and use Go or something else?
I love to work in python and I came here just to point out that it's pretty obvious one should not use it if raw performance is a concern (which is not in many situations). I remembered this Cal Henderson talk at Djangocon:
That said, I wrote a bit of Go recently and the experience was pleasant enough that I'd consider it for future works (should the needed libraries exist), as the extra performance and ease of deployment comes with very little effort from the developer.
The underlying goal was to show that python gets CPU bound very easily, this is not at all the same as saying "python is slow". If I want slow I'd use Ruby.
It's a shame this is not a top voted comment. Not many people understand both the intricacies and tradeoffs involved in all these approaches, and get burned, then blame the tools.
It seems to me that this submission is getting a lot of blowback in the comments for 1) the style and 2) the implication that wiring up Python services with HTTP is bad engineering. I don’t think this is productive.
On the first point, yeah Rachel’s posts are kinda snarky sometimes, but some of us find that entertaining particularly when they are highly detailed and thoroughly researched. I’ve worked with Rachel and she’s among the best “deep-dive” userspace-to-network driver problem solvers around. She knows her shit and we’re lucky she takes the time to put hard-earned lessons on the net for others to benefit from.
As for “microservices written in Python trading a bunch of sloppy JSON around via HTTP” is bad engineering: it is bad engineering, sometimes the flavor of the month is rancid (CORBA, multiple implementation inheritance, XSLT, I could go on). Introducing network boundaries where function calls would work is a bad idea, as anyone who’s dealt seriously with distributed systems for a living knows. JSON-over-HTTP for RPC is lazy, inefficient in machine time and engineering effort, and trivially obsolete in a world where Protocol Buffers/gRPC or Thrift and their ilk are so mature.
Now none of this is to say you should rewrite your system if it’s built that way, legacy stuff is a thing. But Rachel wrote a detailed piece on why you are asking for trouble if you build new stuff like this and people are, in my humble opinion, shooting the messenger.
"JSON-over-HTTP for RPC is lazy, inefficient in machine time and engineering effort, and trivially obsolete in a world where Protocol Buffers/gRPC or Thrift and their ilk are so mature."
Laziness is a virtue in our profession :) huge "citation required" on it being more efficient in engineering effort and most importantly it's ubiquitous and interoperable with any language without having to rely on grpc or thrift implementation and tools for that language/platform.
Re machine time efficiency - to the extent I understood what she was actually complaining about simply none of those issues are attributable to json-over-http being used.
On the digression of efficiency... treating efficiency as something you solve by throwing more machine resources at is causing recognizable problems today.
And the calculus of cost doesn't even always work out the same.
That's a prime example of survivorship bias, projects that spent their runway optimizing for machine resources are not around to have their problems recognized.
I had more than one project where people were cheaper than machines, and going tight on budgets for machine resources led to the project surviving instead of dying.
Hell, a big example of that is Stack Overflow, which runs a very busy site on much less hw than they would have needed otherwise, just by taking high level optimization questions up front.
I can spot, however, kernel-mode HTTP servers (including customized ones), and heavy use of pretty advanced stack with good optimization capabilities. The choices of the stack do make a big impact, something that they have mentioned several times, with the summary of "paying Microsoft licenses paid back very well compared to using popular open-source stacks"
Remember, Performance is also a feature. Both a non-functional one (to reduce your costs) or functional one (to have happier users).
> we’re lucky she takes the time to put hard-earned lessons on the net for others to benefit from.
I genuinely don't see much of a lesson to learn from this particular blogpost, and it appears neither did many others in HN. If there is one, beyond "don't use x", it's hard to find it.
I get the impression that this particular post is being upvoted to the top of HN because of who the author is, not necessarily because this post itself has value. This results in a whole bunch of others reading it, wondering why they're wasting their time with such a rambling post.
2. Remember that green threads tend to have problems with fairness of scheduling.
3. JSON decoding gobbles CPU.
4. Scheduling fairness problems increase response time variance.
4½. Green threads also increase it.
5. Don't forget about retries of timed-out requests into account in protocol design; idempotence is the simplest solution when you can use it.
6. Wake-one semantics to avoid the thundering herd are important for performance when you have multiple threads, and Gunicorn has that thundering herd problem, so you probably don't want to be running it this way on a 64-core box with hyperthreading. (The problem is of course less severe than it was for Apache because the green threads don't thunder.)
7. Gevent uses epoll, not select, poll, or RT signals
8. EAGAIN and SIGPIPE if you didn't know about those. (Somebody is in today's lucky ten thousand.)
9. What kinds of mechanisms “tend to show up given time in a battle-tested [network server] system.”
10. Your systems don't have to be fragile pieces of shit.
I'm not sure whether the person I was replying to is The One to whom all these things are too obvious to be worth mentioning, or if these were too implicit for them to notice, or a combination. Either way, no, thank you for writing it.
The takeaway should be: don't do green threads/event loops for anything that involves any kind of non-trivial processing or even better yet, don't do that unless you really need to do such things (and “better performance” is not valid reason)
One of the people who designed Protobuf has criticized it (Edit: to clarify, one of the authors of v2, see below), so that doesn't really inspire much confidence in it for me.[1] Your general point is correct though, there's much more to a well designed RPC system than what HTTP based systems can do, but protobuf/gRPC is very much lacking in ideas that are decades old at this point, like promise pipelining, etc.
Also, I feel like she (intentionally maybe) is conflating concurrency and parallelism. These "green thread" systems provide concurrency, but not parallelism. That should be something people are aware of when they use them.
> One of the people who designed Protobuf has said it's awful
Hmm, you seem to be citing me. To clarify:
1. I didn't design Protobuf. I just rewrote the implementation (created version 2) and open sourced it.
2. I don't think it's awful. In fact, I think it's best-of-breed for what it is, and certainly a much better choice than JSON-over-HTTP. Yes, in Cap'n Proto I've added a bunch of features which Protobuf doesn't have, like zero-copy and promise pipelining, and I obviously think these features make it better tech. But, to be fair, these ideas make Cap'n Proto more complicated, and whether that complexity is worth it is still very much unproven at the scale that Google uses Protobuf and gRPC/Stubby.
> These "green thread" systems provide concurrency, but not parallelism.
I'm not sure if these specific definitions of "concurrency" and "parallelism" are universal. I wasn't aware of them, at least.
Hi kentonv! I have a tangential question for you, if you don’t mind. I brought up your capnproto with a few friends at work, while chatting about the profiling data of our services (mostly CPU-bound, mostly on protobuf de-/encoding). After convincing ourselves that language-agnostic “zero-cost” requests weren’t completely magical, and that the whole promise thing is very useful, we got to wondering...
Do you think it’s possible, that gRPC/proto could evolve in a non-total-rewrite way to earn the benefits offered by capnproto? I figure you’d be best positioned to answer that kind of question, having worked so intimately on both! :)
We have enjoyed getting to know and using gRPC and proto, I also want to thank you for your work! capnproto is an inspiring solution to a prima facie unsolvable problem, I hope to see it succeed more universally, or at least inspire a proto4. :) Thank you again!
I think Protobuf fundamentally can't achieve zero-copy parsing without changing the underlying encoding. That said, zero-copy parsing only provides a significant real-world benefit in certain use cases. For the use case of RPC over a network -- especially over the internet -- zero-copy parsing has minimal benefit. The places where zero-copy parsing can be a big win are when it means you can mmap() a very large file, or for IPC in shared memory.
On the other hand, Promise Pipelining -- and, more generally, object-capability RPC -- could definitely be added to gRPC. In fact, the very first iteration of Cap'n Proto was a Protobuf-based RPC system that used the same service definition syntax that gRPC now uses. "Cap'n Proto" at the time meant "capabilities and protobuf". (That version of the project was short-lived and shares no code at all with the current Cap'n Proto.)
However, I don't think gRPC is likely to add ocaps unless and until the model proves itself by gaining wide popularity elsewhere. It doesn't make sense for gRPC to take the risk of adding a big, new, experimental feature which they'll be forced to support forever when the demand hasn't been proven yet. Ocaps are gaining a lot of popularity lately (with major new tech like Fuschia and WASI being capability-based) but I think there's still further to go before it would make sense for gRPC to adopt it.
I'm not entirely clear on whether this Github issue would bring feature-parity with Cap'n'proto but apparently Google already has a zero-copy API for Protocol Buffers internally: https://github.com/protocolbuffers/protobuf/issues/1896
I originally wrote the "zero copy" support in proto2, long before I created Cap'n Proto. What Protobuf means by "zero copy" is much more limited than what Cap'n Proto means. Protobuf's "zero copy" applies only to individual string (or "bytes") fields. The effect is that when you call the getter for that field, you get a pointer to the string inside the original message buffer, rather than a copy allocated on the heap. The overall message structure still needs to be parsed upfront and converted into an object tree on the heap (which I count as a copy).
Cap'n Proto is very different. Every Cap'n Proto object is actually a pointer into the original message buffer. Accessing one element of a large array is O(1) -- the previous (and subsequent) elements don't need to be examined at all. Similarly with structs, each field is located at a known fixed offset from the start of the struct, so can be accessed without examining other fields. Protobuf inherently cannot do this; there is no way to know where a field is located without first parsing all previous fields in the same message.
Thanks for the response, I was being a bit tongue in cheek. I don't think you actually think it's awful :) And thanks for clarifying that.
About concurrency vs. parallelism, I think it is fairly standard to think of them as two different concepts that overlap somewhat.
You can have concurrency with parallelism (e.g. pthreads, or M:N threading where you map "green threads" on to processes that can run in parallel). You can also have concurrency without parallelism. The difference between the two is that parallelism can be deterministic, whereas concurrency is always going to be non-deterministic.
> > These "green thread" systems provide concurrency, but not parallelism.
> I'm not sure if these specific definitions of "concurrency" and "parallelism" are universal. I wasn't aware of them, at least.
To be clear, since GP didn't define them: concurrency simulates parallelism through context switching. Context switching itself encompasses both cooperative multitasking (gevent does this) and preemptive multitasking (modern operating system threads when they're sharing a CPU).
AFAIK it is universal, but close enough not to matter in most cases so people get lazy with their words.
Those definitions have been coming into fashion in the last couple of decades. I think it's useful to have the distinction but I wish we had new words that didn't previously mean both things.
I think the main issue is it seems really one-sided and the intent was to be snarky, vs educational. I posted a comment here detailing some ways to work around some of the pitfalls. I think if she devoted more time in the article to solutions vs. complaining, her points would come across more productively.
This is a great review of what is going on "behind the scenes."
As the maintainer of about 5 little services with this structure I have vowed never to write another one. The memory overhead alone is a source of eternal irritation ("Surely there must be a better way....").
Echoing other commenters here, the real cost isn't actually discussed. Namely that there is a solution to some of these problems (re long running tasks?), but it carries with it a major increase in complexity. Its name is Celery and oh boy have fun with the ops overhead that that is going to induce.
A while back I did some unscientific benchmarking of the various worker classes for python3.6 and pypy3 (7.0 at the time I think?). Quoting my summary notes:
1. "pypy3 with sync worker has roughly the same performance, gevent is monstrously slow gthread is about 20 rps slower than sync (1s over 1k requests), sync can get up to ~150rps"
2. "pypy3 clearly faster with tornado than anything running 3.6"
3. "pypy3 is also about 4x faster when dumping nt straight from the database, peaking at about 80MBps to disk on the same computer while python3.6 hits ~20MBps"
I won't mention the workload because it was the same for both implementations and would only confuse the point, which is that there are better solutions out there in python land if you are stuck with one of these systems.
One thing I would love to hear from others is how other runtimes do this in a sane and performant way. What is the better solution left implicit in this post?
"I will not use web requests when the situation calls for RPCs"
I'm surprised how often devs treat this distinction as architecturally meaningful. Web requests are just RPCs with some of the parameters standardized and multiple surfaces for parameters and return values - query string, headers, body.
This is completely orthogonal to the strategy used to schedule IO, concurrency, etc.
I think what’s she’s getting at is that RPC usually comes bundled with some kind of strictly typed serialization format and standard infrastructure (service discovery, dispatch, error handling, etc.). A lot of web frameworks just take one request and hand it to a function, and the rest, including decoding the loose JSON that might be in the body, is up to you.
RPC systems come in many different levels like programming languages. While there are low level RPC systems which are just a simple layer around web requests, there are others that do a lot more. They can do retries, host selection, and stateful operations like the article mentions.
These systems tend to take care of a bunch of easy to mess up logic which tends to be accumulate around any system that wants to send something and not mess up. Any sufficiently old web request system tends to look identical to a high level RPC system designed badly.
So choosing an RPC system should give you all the features you’d eventually end up building around using web requests without spending your time rewriting it.
Thanks for posting this. I had the same reaction. I think making this distinction ends up muddling what an author means when they refer to RPC and ends up overloading the term.
REST constrains your architecture in ways that RPC doesn't. Of course you can use HTTP without using REST, but then you're paying a lot of the costs of REST without getting its benefits, and a simpler RPC protocol might be better.
Proxies, built-in retry semantics in many clients, code complexity for dealing with the mandatory flexibility in HTTP, the presumption that GET is safe for, say, prefetching or spidering, etc. I think REST is often worth the cost (I might be the only person on this thread who has written an HTTP server in assembly), but if you're not using REST, using HTTP will tend to cause you a lot of headaches you could have avoided, for little or no benefit.
GET requests are supposed to be idempotent and side-effect free. Sadly, too often, they are not. Internal state gets mutated, a query parameter gets stored and/or processed in a way that affects future reads, and so on.
But then again, nothing prevents you from writing RPC based service with those semantics either. It just might be that people who develop and maintain RPC services are better aware of the consequences of their actions.
REST is conceptually easy, has human-readable on-the-wire payloads and operates on synchronous semantics.[×] The bar for entry is simply lower, allowing even an inexperienced developer to get going with very little friction.
The very fact that I can do this:
import json, requests
dada = json.loads(whatever)
res = requests.post(URL, data=dada)
... means it's incredibly easy to get off the ground and shuffle data between services. Conversely, it's also easy to send the wrong data, in a wrong way.
×: webhooks are effectively callbacks, but instead of setting the code entry point for async return in your call, you expose a webhook route and treat it as any other request.
Your example code uses HTTP but not REST. This explains most of our apparent disagreement; what you are talking about when you say “REST” has very little to do with REST itself. REST is not conceptually easy; it's just that people frequently confuse it with HTTP.
I mean... you do have to parse text formats. HTTP parsing may be a solved problem, but that doesn't mean the overhead or complexity of doing so disappears.
Also, TLS is not really ideally lightweight for RPCs, but you should absolutely encrypt your RPC traffic (imo.) So I really think the whole stack is out.
(P.S.: If you are wondering what kinds of 'lightweight' replacements for TLS exist, I think my personal favorite attempt is CurveCP, although it is a bit dated nowadays. I wouldn't often recommend people roll their own, but you could certainly do something simple with NaCl/libsodium directly. Maybe QUIC also fits the bill?)
> that doesn't mean the overhead or complexity of doing so disappears
No, it doesn't. But there is no evidence this overhead actually mattered here. It usually doesn't because the CPUs easily outrun whatever bandwidth is available which is why JSON over HTTP is fine 99% of the time. There is absolutely nothing in this blog post that shows that's not also the case here. No rationale is provided as to how a strongly typed RPC mechanism would solve any actual problems the services is having.
So we're left with guesswork and the authors hang-ups about HTTP vs some as yet unnamed RPC solution.
Also, Gunicorn? Thundering herd? These are solved problems. Space the toy proxy and use something real like haproxy. At a minimum.
Finally, none of this griping about HTTP vs RPC actually addresses the _actual_ problem: the server can't process requests in a timely manner. That points to some deeper inefficiency or design issue that likely has nothing to do with Python or Gunicorn or Gevent at all. We're not given any insight as to what the hell the server is doing with all that CPU. Or why the client isn't using a protocol intended for long running processes; RPC schemes have timeouts too ya know....
HTTP is overly verbose, is a pain and slow to parse, and there are various interpretations of what the protocol spec allows and disallows. If machines are communicating why should it be in a human readable format? Binary is far quicker
This is based on a number of obsolete premises. Contemporary HTTP techniques include[1] framed binary wire protocol, redundant header elimination, compressed header values and efficient cryptography. HTTP/2 + TLS 1.3+ are extremely efficient together and are difficult to improve upon. When compatibility, implementation quality and ubiquity[2] are considered they are effectively impossible to improve upon. Except...
If, in the unlikely case that your particular bit of brilliance is indeed hampered by the vestigial amount of overhead still present in contemporary HTTP, then you might do as Google and several other operations have done and dispense with traditional techniques (including TCP) altogether via QUIC.
Just because what you see in the Network tab of your browser's 'Developer tools' window looks like something from 2005 doesn't mean that's what is actually on the wire. It mostly isn't any longer as a share of global traffic.
But again, all of that is irrelevant; the post provided no evidence that replacing HTTP with some dubiously unnamed form of RPC would solve any actual problems. HTTP was tossed in the rant basket with a bunch of other things, few of which appeared relevant to the actual failure modes cited.
There is nothing that says that "RPC" can't do multiple request/response cycles over an existing open (and encrypted) connection rather than initiate a new one for every call, just like HTTP. Or even pipeline them like HTTP/2.0
Or even do RPC calls in both directions over one stream socket... essentially all the “inflexible” RPC protocols of 90's can do that (and incidentally the way how this is implemented usually involves nested event loops)...
HTTP1.1 only closes the connection if one party sets Keepalive: Close. It was HTTP1.0 that was one-shot. There was also pipelining support added but apparently nobody bothered to support it.
HTTP 2 is more reasonable. But by the time you get to HTTP 3 you're just doing HTTP 2 over QUIC. At which point, why not just send RPC payloads directly over QUIC?
That's circular. I can't argue anything from this point. "Why should I get an electric car when I have a good old gas car in the driveway?" I don't have an answer. I do have answers for why one is better than the other. APIs work over HTTP in spite of limitations, not because of good synergy. I think gRPC is the most reasonable implementation of such (disclaimer: I work for Google, but not on gRPC and I've used gRPC before I worked at Google) but I still think it is overkill for many people. If you are using HTTP+REST+JSON and it works fine for what you are doing, then fine - there's an ecosystem already built around it. But the kinds of things people do with lighter weight and more efficient RPC layers literally aren't doable over standard HTTP/1.1 and REST. It enables stuff you wouldn't think of, when you can measure the absolute overhead in bytes. (As an example, I'm not aware of anyone actually doing this, but it would almost certainly be possible to forward low level signals like USB or perhaps even PCI express packets over a lightweight RPC layer, and get all of the encryption/access control/etc. you already have in your stack.)
Answers for why HTTP/1.1 is a poor fit:
- Text format requires text parsing. How long do you limit the header lines? What transport compression do you support? Text parsing is inefficient compared to binary formats.
- A lot of difficult to understand behavior. When do you send 100 Continue, what do you do when you receive it? What happens when you are on a keep-alive connection and there's no Content-length? (There's a whole flow chart for something simple like this.) etc.
- A lot of cruft. Like chunked encoding is weird. Trailers are also weird. What happens when a header is specified twice?
Answers for why HTTP/2 is still a poor fit:
- What are the headers even for? You now have this entire section of your request that doesn't matter, with its own compression scheme called HPACK. Why?
- Server push. It's nice that you have bidirectional streams, but this is clearly designed for browser agents. gRPC repurposes this for bidirectional streaming as it should be, but...
- ...Often times, hacks like that lead to the worst problem: You did all of this work to use HTTP as an RPC layer, and you can't even use it in a browser because the sane things you do for your backend might not be compatible. in gRPC there's a special layer for handling this, but it's a lot of additional cruft.
HTTP/REST is great because there's a huge ecosystem, but that's not even a solid win due to the complexity. As an example, years ago I ran into huge problems with Amazon ELB because it was buffering my entire request and response payloads, and imposing its own timeouts on top. All documented behavior, but you can't just plug in this HTTP thing and hope for it to work. Basically anything in the middle that also speaks HTTP has to be carefully configured. Again, leading to doubt over the whole point of using a protocol like HTTP. There's rules for what should be GET, PUT, POST, DELETE, and yet those all interact strangely. No payload in GET body, some software gets weird about calls like DELETE, so sometimes you have to support POST for what should be a PUT and so on.
And at the end of the day, all you really wanted was RPC payloads in both directions, and you have all of this crap around it, and it's largely just because web browsers exist, but none of this stuff even works well together.
It works OK if you don't really care much and just throw a software stack together, but that doesn't mean it will be efficient, doesn't mean you won't run into problems. I definitely prefer to go for simpler, and HTTP is not actually simpler. It just has the benefit of having an existing ecosystem.
I don't think we disagree about anything here, if you wanna optimize for maximal machine/network utilization then optimize for that with gRPC or equivalent, if you wanna optimize for lean stack and have to use HTTP anyway because you're on the web then use (RPC over) HTTP
- both can be considered more "efficient" depending on the setting and your constraints.
But the point was that contrasting web requests with RPC is a mistake of category and has little to do with various IO handling and concurrency models that the author was discussing.
Well, the thing is, I do agree with the author, though, on their point of not using web requests for RPCs. I think we must be interpreting the author's text differently.
Except she writes:
"
Then it does its weird userspace "thread" flip back to the original request's context and reads the response. It chews on this data, because, again, it's terrible JSON stuff and not anything reasonable like a strongly-typed, unambiguously, easily-deserialized message. The clock is ticking and ticking.
"
If she laments that it is bad design to do the deserialization on the IO thread it is just as true for JSON as it is for protobuf or whatever "true rpc" format she considers worthy.
It is less true for formats that deserialize faster. I still don't see where she is confusing the two. At the very top, she explicitly notes them separately:
"I will not use web requests when the situation calls for RPCs. I will not use 'green' (userspace) 'threads' when there are actual OS-level threads and parallelization is necessary."
Sigh. Yes. I have been there and done that (more or less) and it sucks. The root problem is that data scientists really want to use Python for machine learning, but wrapping a Python model in a service that uses CPU and memory efficiently is really difficult.
Because of the GIL, you can't make predictions at the same time you're processing network IO, which means that you need multiple processes to respond to clients quickly and keep the CPU busy. But models use a lot of memory and so you can't run all THAT many processes.
I actually did get the load-then-fork, copy-on-write thing to work, but Python's garbage collections cause things to get moved around in memory and triggers copying and makes the processes gradually consume more and more memory as the model becomes less and less shared. Ok, so then you can terminate and re-fork the processes periodically, and avoid OOM errors, but there's still a lot of memory overhead and CPU usage is pretty low even when there are lots of clients waiting and...
You know I hear Julia is pretty mature these days and hey didn't Google release this nifty C++ library for ML and notebooks aren't THAT much easier. Between the GIL and the complete insanity that is python packaging, I think it's actually the worst possible language to use for ML.
She's talking about green threads which is different from regular threading in python. Under nodejs/python style green threads only IO calls are concurrent to a single computation task. There is no parallelism under both styles of threading unless you count concurrent IO as parallel.
She is basically complaining about a pattern that was popularized by NodeJS and emulated in python by older libraries like gevent, twisted and tornado. Currently python3 uses keywords async/await as an API around the same concepts implemented in the older libraries.
In the case of the article, you are correct. I have a slightly different case where I'm wrapping scikit-learn model. We're NOT just calling another service and waiting for a response, we're doing computation, in Python. So the GIL is actually a problem.
> Because of the GIL, you can't make predictions at the same time you're processing network IO
Why not? If the model is a Python wrapper around some C/C++ library, then GIL can be released and this is actually a recommended approach used by almost any CPU-intensive python libraries - https://docs.python.org/3/c-api/init.html#releasing-the-gil-... You can have parallel computations inside your wrapped C extension, while Python interpreter is processing IO.
This is spot on. My one and only gripe is with this part:
> So how do you keep this kind of monster running? First, you make sure you never allow it to use too much of the CPU, because empirically, it'll mean that you're getting distracted too much and are timing out some requests while chasing down others. You set your system to "elastically scale up" at some pitiful utilization level, like 25-30% of the entire machine.
Letting a Python web service, written in your framework of choice, perform CPU-bound work is just bad design. A Python web service should essentially be router for data, controlling authentication/authorization, I/O formatting, and not much else. CPU intensive tasks should be submitted to a worker queue and handled out of process. Since this is Python we don't have the luxury of using threads to perform CPU-bound work (because of the Global Interpreter Lock).
I like the author's articles most of the time. While this article contains some truths, I don't think it argues very persuasively for its conclusion. Okay, these parts of the Python ecosystem don't work well together, and it's a bad, unpolished experience. Fair, as with other criticisms of Python.
The question, however, is why one would use gevent at this point in Python's evolution. There's async await now, and things like FastAPI. If you want to use, say, the Django ecosystem, use Nginx and uWSGI and be done with it. Maybe you need to spend some more resources to deploy your Python. Okay. Is that a problem? Why are you using Python? Is it because it's quick to use and helps you solve problems faster with its gigantic, mature ecosystem that lets you focus on your business logic? Then this, while admittedly not great, is going to be a rounding error. Is it because you began using it in the aforementioned case and now you're boxed into an expensive corner and you need to figure out how to scale parts of your presumably useful production architecture serving a Very Useful Application?
Maybe you need to start splitting up your architecture into separate services, so that you can use Python for the things that it does well and use some other technology for the parts that aren't I/O bound and could benefit from that. But that's not this article is about. This article is about someone making the wrong choices when better choices existed and then making a categorical decision against using Python for a service. I'd say that's what "we have to talk about" if you ask me.
I've been working on a legacy internal python system that suffers from most of the complaints here (and in the excellent COST paper Rachel links at the bottom).
The problems alluded to are, yes, solvable in python. But they also seem endemic in python systems.
When everyone who uses the tool uses it wrong, maybe it's not the user's fault.
(That said, I generally do think there's a time and place for python systems or web apps. That time is generally when speed and maintainability is significantly less important than flexibility)
> The problems alluded to are, yes, solvable in python. But they also seem endemic in python systems.
>
> When everyone who uses the tool uses it wrong, maybe it's not the user's fault.
Yes, though that doesn't mean it is necessarily the code's fault.
Honestly, I was very confused by this article, because I thought everyone understood what was going on, the trade-offs involved, and how that ought to impact your design decisions.
It's not that Gevent'd Gunicorn is intrinsically a bad thing. You're going for cooperative multi-tasking/concurrency, so no preemptive multi-tasking support. This creates potential challenges with fair scheduling if you have real-time constraints like timeouts... so you design accordingly.
One of the advantages of this model is you do indeed need less memory (and often a little less CPU) to handle high load levels. It's not like you are intrinsically better off if you use Python in a forking model. You can still end up so CPU bound that you timeout handling requests... the only difference is you'll get fairer splitting of the CPU's time across tasks. It can actually get worse if you get lost in an infinite series of context switches (yes, there are ways to mitigate this problem... although they can create fair scheduling problems... it's a natural tension), or worse still, start swapping.
If the notion that running out of CPU might mean you have timeouts hasn't occurred to you...
> When everyone who uses the tool uses it wrong, maybe it's not the user's fault.
I'm not the GP, but I guess that a tool that is
> quick to use and helps you solve problems faster with its gigantic, mature ecosystem that lets you focus on your business logic
Can never cover all bases perfectly, and is generally great when starting out, but ultimately not built to be very forgiving when grown too much.
> now you're boxed into an expensive corner and you need to figure out how to scale
When you get to this point, and the requirements start to be more focused on performance, then it's time to start switching Python out. That does not devaule Python in the earlier stages of development and operation.
The point being that Python is the right tool for getting stuff working quick, not to getting stuff executing quick.
Agreed on both (a) I usually like the author’s articles and (b) think she’s missing the point on this one.
gevent and gunicorn were good attempts to remedy a bad situation. async/await is the solution that the Python community is coalescing around. Even with Django, there are active efforts to support ASGI. [1]
Gevent was doing it right and async syntax was a huge mistake that fractioned community-contributed libraries into two incompatible camps with lots of unnecessary cloning happening at present moment.
In high-level languages with virtual machines and/or garbage collectors, the runtime system should be solely responsible for scheduling green threads around IO entry points, all without special syntactic markers. GHC has it right (https://www.aosabook.org/en/posa/warp.html), Gevent was a right development with on-par async performance metrics (https://gist.github.com/rfyiamcool/41d4004b7fd46516d0b4f34f6...), that had a standard synchronous coding style. It could be adopted into the core language and improved further without splitting the community.
I have run Python in its traditional synchronous form, using gevent, and with the more recent async/await syntax. I don’t hold this opinion strongly, but do lean towards async/await syntax for the sake of explicit is better than implicit [1]. Node.js which was asynchronous from the start also separates async from sync explicitly with, for example, distinct fs.readFile() and fs.readFileSync() functions [2].
(Edit: Commenting only on clarity of syntax. Those performance metrics are interesting and I’ve admittedly never hit a scale where the difference has a practical impact.)
> I don’t hold this opinion strongly, but do lean towards async/await syntax for sake of explicit is better than implicit
I guess it's a question of where the line that defines "too implicit" should be drawn. I'm totally fine with implicit gevent yields, yet sometimes when I need to do heavy Python meta-programming, I wish things were more explicit around language semantics, namely everything around inheritance handling inside meta-classes (for instance, see the current implementation of enum.Enum).
What is the basis for that assertion? The "high-level VM with green threads" approach has been tried for a long time - most prominently, Java - and it just doesn't seem to stick.
For Python especially, it is problematic because it is a glue language more often than not, and VM-specific green threads are not good for cross-language interop. When you have promises and async/await around them, at ABI level it can all be mapped to a simple callback, which any language that has C FFI can handle. When you have green threads, every language in the picture has to be aware of them - and god forbid you have two different VMs with different notions of green threads interacting.
The fact it's implemented in a runtime system I use nowadays
> For Python especially, it is problematic
Shall we say it's a complex task instead of a problematic case?
> The "high-level VM with green threads" approach has been tried for a long time - most prominently, Java - and it just doesn't seem to stick.
afaik, it didn't stick because JNI related to green threads needed to be scalable on SMP, while the runtime implementation used a single thread, and then a decision was made to move to native threads, which doesn't necessarily indicate any inherent issues with the VM-managed green threads (and CPython specifically cannot utilise SMP with its threads anyways). At least, this was mentioned in https://www.microsoft.com/en-us/research/publication/extendi... (Section 7).
> When you have promises and async/await around them, at ABI level it can all be mapped to a simple callback, which any language that has C FFI can handle.
Why a VM wouldn't be able register those callbacks and bound them to a concrete OS thread when it knows that an FFI interop is going to happen? I don't see the point where explicit async/await is needed for it. It may require thread-safety markers (and that's what GHC's FFI interface has - https://wiki.haskell.org/GHC/Using_the_FFI#Improving_efficie...), but that's not the story about the invasive async syntax we have in contemporary Python.
It works nicely in Golang and Haskell. The main issue with Java and Python is that the core runtime developers, reasonably, do not wish to spend a lot of time developing equivalent systems.
I can't speak for Haskell, but inadequate performance of C FFI in Go is routinely mentioned as the reason why the community is so reluctant to wrap existing C libraries, rather than reimplementing them from scratch in Go.
To be completely honest, I don't know much about C interfaces or systems programming in general. Looking at benchmarks Go's FFI does indeed seem to perform pretty poorly. However, as a web dev, I find it works well for the concurrent programming tasks I find myself dealing with.
The ASGI spec came from the Django project as part of their Django Channels work. "There are active efforts to support ASGI in Django" is selling them a bit short, methinks.
I still use gevent, even for brand new projects. I work much faster with it than with async/await, and the performance appears to be comparable. I've tried getting used to async/await, but I find gevent much simpler to work with, in spite of the arguments made in places like https://glyph.twistedmatrix.com/2014/02/unyielding.html.
I wasn't aware of this particular inefficiency, but gevent is still fulfilling its purpose for me very well, and I see no reason to change. I like lightweight threads and thinking in terms of background jobs and dividing up work instead of remembering what things to annotate and when. I use locks if I need predictability. I like Python because I can develop quickly with it, and I can do so even faster with gevent while still getting more than enough performance.
Just because you don't understand the difference between gevent and asyncio, please don't post garbage laundry lists of your flavor of the month stack choices.
It's an amazing library and a very unique way to write cooperatively-scheduled applications. Best of all it works with existing libraries and doesn't require special "asyncio" implementations from top to bottom. It's not a silver bullet, but don't fool yourself that asyncio is because it's been blessed.
I think I understand the difference between gevent and asyncio pretty well. Moreover, it sounds like you understand the difference between community adoption and not, but you're fighting against community adoption with your own opinion of what is a "garbage laundry list" -- okay. You can say that. But, there's a reason the gevent approach is not what the community settled on.
What you call "special" asyncio implementations others would merely call obviously explicit code. Async/await is a powerful syntactic construct. I would never go back to gevent hell after using it.
It seems to be a complaint against doing process-per-CPU.
Let's say your server has 4 CPUs. The conservative option is to limit yourself to 4 requests at a time. But for most web applications, requests use tiny bursts of CPU in between longer spans of I/O, so your CPUs will be mostly idle.
Let's say we want to make better use of our CPUs and accept 40 requests at a time. Some environments (Java, Go, etc) allow any of the 40 requests to run on any of the CPUs. A request will have to wait only if 4+ of the 40 requests currently need to do CPU work.
Some environments (Node, Python, Ruby) allow a process to only use a single CPU at a time (roughly). You could run 40 processes, but that uses a lot of memory. The standard alternative is to do process-per-CPU; for this example we might run 4 processes and give each process 10 concurrent requests.
But now requests will have to wait if more than 1 of the 10 requests in its process needs to do CPU work. This has a higher probability of happening than "4+ out of 40". That's why this setup will result in higher latency.
And there's a bunch more to it. For example, it's slightly more expensive (for cache/NUMA reasons) for a request to switch from one CPU to another, so some high-performance frameworks intentionally pin requests to CPUs, e.g. Nginx, Seastar. A "work-stealing" scheduler tries to strike a balance: requests are pinned to CPUs, but if a CPU is idle it can "steal" a request from another CPU.
The starvation/timeout problem described in the post is strictly more likely to happen in process-per-CPU, sure. But for a ton of web app workloads, the odds of it happening are low, and there are things you can do to improve the situation.
Using an ASGI server that supports async/await, such as Uvicorn, instead of green threads, forking, etc, seems like a good idea these days. Also means you can use Starlette which has a much nicer design IMO than some of the old frameworks.
Any of the above (or Sanic) can do ~3K RPS on a single core on a Raspberry Pi (which is where I test things for portability, optimisation and a little fun), and the RAM overhead is generally not that bad small (just did a little "hello world" uvicorn/blacksheep app and I see 22MB resident/10MB shared per worker, and one of my Clojure servers taking up over four times that...)
Those below who complain about the complaints are missing the point.
We (computer programmers as a general class) have not learnt from history. We keep reinventing wheels and each time they are heavier and clunkier.
What we used to do in 40K of scripts now takes two gigabytes in python/django/whateverthehellelse. E.g. mail list servers. Mailman3 hang your head in shame!
> "Why in the hell would you fork then load, instead of load then fork?"
In python it often seems to make little difference. The continual refcount incrementing and decrementing sooner or later touches most everything and causes the copy to happen whether you're mutating an object or not.
I've had some broad thoughts about how one would give cpython the ability to "turn off" gc and refcounting for some "forever" objects which you know you're never going to want to free, but it wouldn't be pretty as it would require segregating these objects into their own arenas to prevent neighbour writes dirtying the whole page anyway...
They took a step towards this with https://docs.python.org/3/library/gc.html#gc.freeze but it doesn't go as far as disabling refcount touching outright. I've experimented with doing that, both per-object and just globally, and the results really were promising if your forkserver can keep up with providing the necessarily much shorter-lived worker processes.
Thanks for this link - I had completely missed it (I think I was just expecting to disable gc entirely or perform some rudimentary surgery on its linked list)
This isn't quite the same thing, but it is one of the articles that spurred my thoughts on this subject. In cpython, gc != refcounting. Instagram were talking about disabling gc, which would have stopped objects which they weren't using from being falsely copied, but wouldn't have stopped objects that they were using (but not mutating) from being copied.
I think this conflates a poor implementation of a webserver with python/gunicorn/gevent being bad. There are a few (easy) things to do to avoid some of the pitfalls she encountered:
> A connection arrives on the socket. Linux runs a pass down the list of listeners doing the epoll thing -- all of them! -- and tells every single one of them that something's waiting out there. They each wake up, one after another, a few nanoseconds apart.
Linux is known to have poor fairness with multiple processes listening to the same socket. For most setups that require forking a process, you run a local loadbalancer on box, whether it's haproxy or something else, and have each process listen on its own port. This not only allows you to ensure fairness by whatever load balance policy you want, but also lets you have healthchecks, queueing, etc.
>Meanwhile, that original request is getting old. The request it made has since received a response, but since there's not been an opportunity to flip back to it, the new request is still cooking. Eventually, that new request's computations are done, and it sends back a reply: 200 HTTP/1.1 OK, blah blah blah.
This can happen whether it's an os threaded design or a userspace green-thread runtime. If a process is overloaded, clients can and will timeout on the request. The main difference is in a green-thread runtime it's about overloading the process vs. utilizing all threads. Can make this better by using a local load balancer on box and spreading load evenly. It's also best practice to minimize "blocking" in the application that causes these pauses to happen.
>That's why they fork-then-load. That's why it takes up so much memory, and that's why you can't just have a bunch of these stupid things hanging around, each handling one request at a time and not pulling a "SHINYTHING!" and ignoring one just because another came in. There's just not enough RAM on the machine to let you do this. So, num_cpus + 1 it is.
Delayed imports (because of cyclical dependencies) is bad practice. That being said, forking N processes is standard for languages/runtimes that can only utilize a single core (python, ruby, javascript, etc.).
This is not to say that this solution is ideal -- just that with a small bit of work you can improve the scalability/reliability/behavior under load of these systems by quite a bit.
The problem being described here isn't Python. gunicorn, or gevent; it's bad programming. I'd be willing to bet there are systems out there written in C++, Java, and Ruby that do the same dumb things. The solution is to not do dumb things--to understand what your program is doing. It's perfectly possible to do that in Python, gunicorn, and gevent.
In the case of Java,
the Selector API was introduced in Java 4 (2002) for this exact reason, avoid to have all the threads to all waits/being notified on accept().
In this crap situation atm, can attest.
Currently maintaining a Python app for the delicate snowflakes whose years of math understanding somehow prevents them from being able to learn a language that isn’t Python.
I really like Rachel's blog and I think I understand the point she's making here. However I think she sees it from the point of view of very large scale services. In many cases you can have a solution ready more quickly with less developer time if you use these technologies, and at smaller scale this more than pays for the additional hardware you need to cope with the inefficiency. In such cases writing services in python is pragmatic and sensible.
Thanks for the pointer. I was messing around with --preload and --timeout flags and they seemed to work, although I think that isn't fixing the root problem.
I’m not sure what the main point of the article is? Telling us that eventloops have problems? Sure, the lack of preemption can cause latency problems in some tasks. But native threads have other issues - that’s why people use eventloops.
Is the message that epoll and co are lot efficient enough? That’s also true. Api Problems and thundering here are known. And not only limited to Python applications as users. io completion based models (eg throuh uring) solve some of the issues.
Or is this mainly about Python and/or Gevent? If yes, then I don’t understand it, since the described issues can be found in the same way in libuv, node.js, Rust, Netty, etc
I can relate to the writer, working with legacy sucks. This was my main take on the blog post, others are just brilliant ways of rationalizing why other people such and why there are other people than me.
Definitely, I am smarter than the guy who wrote this because then I wouldn't have these problems(Or He is smarter and I just didn't ask him about his rationale).
What I design wouldn't run into these BS problems that I have to fix, It just wouldn't run into problems generally. (Or It would have more problems than this one)
I had these conversations with myself at least a thousand times, and then it was just the case in the parentheses.
In this very particular sense, almost anything else is better. Dynamic scripting languages that are intrinsically single threaded because they were single-threaded for the first 10-15 years of their lives, and it is virtually impossible to retrofit true threading all the way from their basic runtimes through all their libraries [1], are basically the pessimal case for this particular problem.
This is not the whole story of the value of those languages. As the article even alludes to, at small loads or with lots of care this can be made to "work". But it is something that an engineer should know about them before picking them up and using a tool for something it isn't really good at.
[1]: I add this caveat because I don't think there's anything about dynamic scripting languages that makes them intrinsically difficult to thread any moreso than any other category of language, it's just that by an accident of history, they all come to us from the 1990s personal computer world, and they all spent at least a decade cooking and setting and building libraries and communities and developer skillsets before a serious need for threading was even on the horizon.
It's a good caveat, because Lua, in particular, has fully-reentrant functions. You can run a bunch of Lua_states cooperatively or on a threaded basis without problems. Everything the VM does, from the C side, receives a Lua_state as the first argument.
It's intrinsically single-threaded, yes. But each instance is quite small and they stay out of each others way. Add coroutines and there's a lot you can safely do with Lua that's a real pain to accomplish with Python.
In the days of yore, I might've attempted to shoe-horn Tcl in such a situation. Decent event-loop for distributing tasks and (like Lua, but unlike Python/Ruby) more eager to escape to C for performance-sensitive tasks.
> I don't think there's anything about dynamic scripting languages that makes them intrinsically difficult to thread any moreso than any other category of language
I think there might be some intrinsic factors:
1. Languages like Python don't want to expose the program to undefined behavior. Defining some trivial class, and then manipulating it from two threads at the same time, is not supposed to crash the program or introduce horrible security issues.
2. Languages like Python have "property bag" objects. That means (someone who knows better should check me on this) that most writes in a typical program are hash table operations, rather than primitive stores of an int or whatever. Locking each table separately, or using a fully atomic implementation, can be a significant slowdown in single-threaded programs, compared to using a GIL.
I think if you wrote a new dynamic scripting language from scratch with the intention that it be threaded that you could probably come up with something. Python is blocked from it not because it is impossible, but because they've been unwilling to orphan all their C extensions.
The problem is that while it may raise hackles when stated so bluntly, dynamic scripting languages are generally on their way out anyhow (although they have a ways to go before that is even generally recognized, and even longer before they're legacy; we're talking plural decades for the whole process here), and also, fighting really hard to get threading into a language that is also intrinsically slow is just not the sort of compelling end-point that would inspire an author to create it, and a community to back it up. Why would you want a "threaded scripting language" that literally takes a 32-core machine to catch up to what numerous languages would be capable of doing on a single core? (A new language will have a long ways to go before it can match a good JS VM. Look at Perl 6/Raku's performance history.) Especially since such a language would be racing things like Nim, Crystal, a D revitalization, and several other contenders who are getting 90% of the convenience of scripting languages while getting 90% (or more) of the performance of compiled ones.
You don't necessarily need one; it depends on what kind of application you have and what its bottleneck is. If your application is network I/O bound, event-driven asynchronous I/O, which is what this article is describing, works fine, and a single process/thread even in a dynamic language like Python can handle a large volume of requests.
The specific issue this article is describing is due to a particular poor implementation of event-driven asynchronous I/O, not a general problem with the entire concept.
If your application is CPU bound, then yes, you need to use threads (or multiple processes), and you shouldn't be trying to mix event-driven asynchronous I/O with that.
Anecdotally, seems like a lot of people are using Go, or maybe Elixir to keep the dynamic typed development experience but much more efficient hardware utilization.
I really think this should be solved at the OS level. Why is it so hard to implement kernel threads in an efficient way? Threading shouldn't need to be done in userspace.
It's not hard. The very start of the article says she trusts proper kernel threads more. But a bunch of these languages were designed and written in a time before threads were much of a thing. So they fake threads in user mode rather than fix the assumptions of the runtime.
Now, using kernel threads uncarefully will lead to different problems, which are famously tricky... But what I mean is, it is not "omg kernel threads are hard" that is the primary factor preventing direct access to them. It's that the language runtime has a lot of baggage.
It's the same thing. She and many other people don't know it but she's complaining about the same model that was introduced and popularized by NodeJS.
Current python "green threads" use the keywords async/await as an api. Underneath this api, the state of the art implementations use libuv (wrapped in a python library called uvloop) the exact SAME C++ library that powers nodejs.
I find this post to be unintelligible. Given that it's been upvoted to the top of HN though, can someone TL;DR of the intellectual value of this post? It seems to be stepping through the details of what is going on while also being rambling.
Gotta love those people who fail to understand how things are supposed to be used, fail miserably as a result, then throw the baby out with the bathwater in a fit of tantrum.
Yes, Python has a GIL. Yes, lightweight threads are mostly good for IO bound tasks. Yes it can still be used effectively if you design your app correctly.
Rachel's posts would be so much more useful if she would just say what she meant, instead of twisting everything into knots to find a way to say it backwards so she can be sarcastic and condescending while doing it.
I'm sure there's some useful information in here, but it's not worth digging through the patronization to find it.
Wow, I’m glad I’m not the only one that feels this way. The sheer amount of “everyone is stupid except for people like me” is astounding. I’d love to see an article on the same topic breaking down what is wrong (by showing the code) and then explaining the “right” way to do it, with code.
What are you talking about? Male? Is the author female? I honestly have no idea who the author is. But I don’t care if the author is male or female. Condescending is condescending.
I read this as a dev war story, and as a person venting.
I would have the same condescending sarcastic attitude while doing it if I was venting too.
I also know a lot of people who like my sarcasm when talking about topics like this, so yeah as a guy who very much gets's where the author is coming from I agree this seems like a double standard.
The sarcastic and condescending tone is what makes it entertaining to read. I'm pretty sure you can find plenty of information on performance-tuning Python in IBM whitepapers, if that's what floats your boat.
I'd read it if it was a tutorial, but I'd read it when I hit performance issues in my Python webservice and it was Google's top result for my problem - not when it hit top 10 on HN.
Statistically speaking, maybe that's the same as me not reading it at all in 90% of universes.
It’s an interesting post to read. Have another go if you can, but I very much agree with you on the tone issue. Imagine if these were ones own notes that had to be read through the next time something like this happened. A more succinct operational — dare I say: positive! — way of writing would really be welcome.
There is a lot of literature on the subject if you want more pragmatic notes. You read Rachel’s blog not only for the tech experience, but for her storytelling skills.
I particularly enjoy her blog.
The problem is that a solution for I/O bound workloads has become generalized as the solution for all concurrency needs when in reality, that’s just half the picture.
She mentioned a hell of a lot of googlable terms: epoll_wait, Apache thundering herd, EPOLLONESHOT, EAGAIN, idempotent requests, userspace threads, copy-on-write, queue depth determination, selective LIFO, strongly typed RPC, ...
Presumably “queue depth determination”, another new term for me, means determining how big the queue of pending requests for a service is allowed to get before further requests are refused (another load shedding measure) rather than being enqueued.
I would counter-argue that dry, positive, informational writing is great for Wikipedia but can also be very boring. This blog has a lot of snark and that's what makes digesting the great information so much fun!
There's a middle term, and you can avoid dryness with tones other than condescension. While I always read Rachel's posts whenever they come up because they're jam packed with wisdom, I always find them a bit off-putting.
if i could put a point on it, it would be the implied entitlement and absence of gratitude. Sure, this architecture is not 100% efficient. But step back for a moment, take a breath and consider the number of human-hours spent to get it where it is today. Consider how many people are busting their humps, many volunteers, to keep improving it. We arw not _owed_ any of this. Just the miracle of elastic server config and multicore processors... Buying into the pessimistic viewpoint is dangerous: When these issues get improved, will we feel grateful and adequate? or will we find new flaws and get snarky about them?
Anyway, what i do really like abt this post is it shows the chain of technical details across the call chain. it connects together info on dozens of man pages, etc. I also appreciate how it points out the inefficiency is quite convenient for service providers.
> Consider how many people are busting their humps, many volunteers, to keep improving it.
I think criticism about gratitude is strange when the author is pretty clearly coming from the standpoint that it was a bad idea to use this in the first place (and, to be fair and with regards to Python specifically, I tend towards that standpoint myself) that labor begins to look like it's being set on fire. No Purple Hearts for self-inflicted wounds and all that.
For the downvoters - I also do not like Paul Graham and Sam Altman - they're the same as Rachel in every way. Little substance, lots of unsubstantiated filler material.
To extend this further, I also don't like NewYorker for this reason alone - I don't have time for convoluted novel-like stories that has the important bit buried somewhere in the middle of 6 pages. If I want to read beautiful and creative prose, I need to be in that mindset. Not when discussing Python innards.
Her posts are entertaining. They aren't intended to be technical resources even though the topics are technical. If you aren't entertained by her style, move on.
She recently posted about which of her posts were most referenced by others, which caused all of those posts to be resubmitted despite being years old.
I would say it's a bad post because the conclusion is wrong.
According to the post, "the thing" with Python/Gunicorn/Gevent is it's less performant than one would like in some circumstances, and a lazy developer might tell you you need to 'set your system to "elastically scale up" at some pitiful utilization level, like 25-30% of the entire machine'.
That's probably true! But not that helpful if you don't say what those circumstances are. There are many circumstances where Python is appropriate for a web service, and many circumstances where green threads work just fine. Tell me when I need to consider using the more complicated solution, don't tell me the simpler solution is always useless and doomed from the start.
I'm really glad to see this as the top comment. I came back to comments after reading halfway to see if I was the only one struggling to extract any meaningful point from this.
I didn't find it Patronising it maybe those with ESL English as a second language with more formal usage and more stratified might not be as comfortable with it
It actually alot of background on why it doesn't work. The post could just be:
tl;dr: gunicorn doesn't know how to multiplex listeners and green threads will ruin your request latency.
The thing is, that post isn't very useful nor interesting. The point here is that the "simple" python architecture doesnt scale well at all so you might not want to use it if you're planning on scaling ever.
My feeling is that if you really wanted the author to improve, you would try to connect personally, establish trust and then talk to them privately about ways you feel they could improve their writing. And maybe you aren't comfortable reaching out privately because it's a woman, so that could go sideways (assuming you are male, which I don't actually know). Let me assure you that if you have good intentions, publicly dogging someone because you aren't comfortable reaching out privately is not a good substitute.
Seeing this comment at the top of the page on the highest ranked post on the HN front page for the only woman programmer that I am personally aware of who regularly makes the front page really feels like a kick in the gut and looks like sexist garbage. And I would like to think better of HN than that.
This is the Internet, where you take whatever feedback you get with the appropriate grain of salt and either choose to improve or not. Most people on the public venues on the Internet - forums, blogs, comments, essays - are not looking to build relationships or establish trust. (There are some exceptions - I've made some great friendships with Internet friends - but they're usually more private niche forums than blogs or other publications with a wide readership.) They're looking to get their opinion out there, build a readership, perhaps influence public discourse, and maybe get some feedback on their ideas.
I've seen similar comments leveled at PG [1] and Zed Shaw [2], so I don't think it's just sexism.
b) not the primary conversation around these two authors
Look to Linus Torvalds for a male example where delivery rather than content is often the primary conversation. That is how egregious the delivery must be for a male to get the tone police called on them
You may have something there. I went looking for the HN comments on Dabblers and Blowhards [1] (which, IMHO, is even more egregiously sanctimonious than Rachel's essay), but the top comment there was responding to content rather than tone. Only the 3 bottom-most comments remarked on tone.
I don't consider Linus Torvalds that vitriolic, BTW. Most of the time when he's angry he's trying to make a point. I think of Erik Naggum [2] or Poul Henning-Kemp when it comes to real vitriol on the Internet.
People have valid criticisms of Linus's delivery, but the content is often good. I tend to remember some of the technical arguments in those rants years after the fact, and cite them.
Keep in mind he did create the Linux kernel and git, so even if he delivers them inexpertly, even on a bad day, he has some technical insight.
All that said: I agree there is some gender bias showing up on this thread.
Oh, of course! If Linus wasn't special nobody would tolerate his style. Women have to be special for society to tolerate sarcasm from them. I'm unsure how old I'll be before a woman like Linus will be recognized rather than shoved aside.
Thank you for your many excellent comments in this discussion. You fought the good fight. You basically won in that this thread long ago ceased being at the top of the page.
Take your winnings and go home. Linus is not above social censure. His team reined him in not hugely long ago and the comment you are hissing at agrees with your larger point that there's some gender bias happening here and was uncommonly reasonable and evenhanded. I upvoted it.
I'm trying to be supportive. I'm trying to tell you "You've done enough. Relax. Take a break. Feel okay about how this went down."
I mean if your mom is dying of cancer or something and screaming at internet strangers is good distraction from more serious problems, cool. Don't let me stop you.
But if the point was "Doreen is right: this thread shouldn't be at the top of the page!" well, it's not anymore. Job well done. Have a cold brew or whatever and feel okay about it.
I'm not sure why my comment is interpreted as hissing/criticism. It was intended as elaboration and agreement. Oh well, people seem to have not liked it so I'll reconsider those types of posts in the future
I think it's an excellent point that a woman with equivalent chops as Linus is less likely to be recognized for it. So I am glad you replied. Thank you.
In part because of the larger context. In part because it sounds like sarcasm, not like you are genuinely agreeing that Linus actually deserves special treatment because of his stature.
I've defended Linus once or twice. I'm also glad he chose to take some time off and rethink things.
I can't think of any women we give similar accommodation to. That doesn't mean they don't exist. But the reality is that Linus is in a league all his own. It just sounds catty to make comparisons to him in that fashion.
I imagine if we genuinely had a "female Einstein," she would be pretty unique and would carve out her own unique relationship to the world at large.
Interesting. I suppose it can be read that way and I'll try to be more clear in the future.
My point is that we do have examples of female excellence, but almost invariably they are not uncouth. It seems more likely to me that the uncouth ones are silenced than that only male excellence can come in a brusque box
Janet Reno used to refer to herself as an awkward old maid to acknowledge her lack of smoothness and more or less dismiss such criticisms. Depending on your age, that might be before your time.
I'm short of sleep. I really don't desire to continue this discussion. I only spoke up because you seemed really frustrated and I wanted you to feel okay about how things went and that's apparently not your takeaway at all from my comment.
If you really want to shut me down and make me look like an absolute fool, you could list off the ten other women programmers who routinely hit the front page of HN that silly, pathetic little ole me completely missed.
I'm not getting into this argument about how it's not sexism because (bs example pretending men and women get treated exactly alike when everyone knows that's absolutely not true).
It's not about "wanting the author to improve". People are free to write on their own personal blog with as much snark as they like, in whatever style they prefer.
However, what seems to have happened here is that a bunch of folks are upvoting this link to the top of HN because of who the author is.
Meanwhile, other HN readers find this particular post to be a waste of time because, frankly, the content of the post itself is not particularly interesting or useful for most HN readers. Other posts[1][2] by this author have been much better suited for the top of HN, for example
I think that kinda hits the nail on the head. For a lot of people on HN the information being presented in the posts isn't new or particularly insightful. So to read the information presented in a tone where the author believes they are the only ones with the "true knowledge" can be very off-putting.
But of course there are other people that may get more out of it and not have a negative reaction to it.
I agree that the previous post was not constructive or effective, but there's nothing sexist about what they said. It's just someone broadcasting critical opinion.
Policing tone is far more prevalent when the speaker is a woman. Perhaps (I'm doubtful) this feedback would be given to a very well known male speaker, but it would not have been the top comment here.
Criticism is a staple of any human discussion/forum. It's not even feedback, it's just complaining about TFA. Sometimes the top comment on HN is someone complaining about the font-color of the blog post. Let's not lose our minds here.
Seems weird to get worked up over spotting a complaint on HN just because a woman wrote TFA. And btw, most complaints on HN are leveled at men simply because men populate this forum and tech more than women. Does that mean this forum hate men? Why is it assumed men can handle it but women can't?
I have to wonder how many women are turned off by the idea that they need to be babied like this and can't take generic online criticism. Or the suggestion that criticism was only leveled at them because they are women. It sure reeks, to me.
Having different standards for different speakers is exactly what you're doing.
Criticizing how a message is delivered is standard HN criticism. Especially the sort of "everyone is smart except for me" tone of TFA. I myself criticize commenters here for that as it's something I can't stand, either.
Why would you think it's something we only see leveled at women here? And, according to what? And, yes, you're then infantilizing women when OP does receive that criticism. I think your heart is in the right place, but you're doing exactly what you think you're condemning.
I don't think this type of criticism is only leveled at women. I think it is
a) much more likely to happen for much softer offenses
b) much more likely to become the primary conversation rather than an aside buried three levels deep in the comments
Perhaps in this instance Rachel's rhetoric was so off-putting that it really deserved top billing for conversation/criticism here. But that doesn't ring true for me, and I sincerely doubt the conversation/top post would be the same if instead written by e.g. Carmack
It's a good rant, but it's still a rant. Don't make it to something it's not, plenty of rants get harsher critiques and or don't receive that much up-votes.
There are few blog posts that make it to the front page of hacker news that don't draw sharp criticism in the comments and that criticism is quite often the top comment.
> “When a woman says it, it doesn’t sound as crazy,” said Maria Guadalupe, a professor at France’s INSEAD Business school and a co-creator with Joe Salvatore, clinical associate professor of educational theatre at New York University’s Steinhardt School, of the play.
Hmmm.
Is your conclusion based on actors reading off lines, or real life tone policing?
Maybe if it's a "natural experiment" it could be that women know they'll be held to a more tolerant standard (by most people) so they can get away with being a bit ruder. Or maybe they don't know the standard is more tolerant for women (they might even think they're being oppressed) but know where the line is where a crowd will turn against them (like most people do), and that line happens to allow them to be a little ruder.
Interesting experiment! I wasn't aware of it. I think it's difficult to extrapolate results, but I definitely have different takeaways than you.
1) The smiling aspect is explained (for me) by society pushing women to constantly smile, but not men. The amount we expect men and women to smile is different and when they violate those norms they're either a bitch (women for too little) or fake (men for too much).
I'm not sure how to interpret the tone aspect, and it's super interesting! It definitely flies in the face of multitudes of studies showing the reverse. I'm inclined to believe the studies which are really quite simple e.g. have people grade a short essay where the only difference between groups is the essay author's name.
I'd be interesting to get the sentiment on Torvold's history of blog posts that make hacker news. Willing to bet my paycheck that his sarcastic and ranty tone was loved.
What you're saying with this post and the one below is that criticizing a woman (even in a situation where women are underrepresented) is sexist. This is obviously not true and if you believe the criticism is unwarranted then you should make your own criticism based on those points.
That's not what I'm saying at all and it's dumbfounding to me that I am getting such a pile on to try to shut me up by probably all men trying to claim there's no sexism here.
My framing actually assumes positive intent gone wrong and suggests that if there is positive intent, this is not a best practice.
Entire audience hears "Some whiny bitch reading in sexism where there is none and that needs to be shut down cuz reasons."
And therein lies the problem.
But I promised myself I wasn't going to be dragged into some shitshow. I knew no matter how carefully I worded it, it was likely to get ugly pushback and not get good faith engagement.
DoreenMichele is actually much less ideological than most. Maybe it's not obvious from this thread, but those of us who have read her in that past know that her thoughts on these topics are actually unpredictable (and in particular, not at all anti-men). That's quite unusual. I'd give her the benefit of the doubt.
Much less ideological than most what? Most women who call attention to them being women on the Internet? Or just most people who post interesting technical stuff online?
Than most people who comment about gender issues on the internet. I find that once you have a few bits of information, you can nearly always predict where someone is going to come down on such matters. It's not common to run into someone who's less predictable that way.
Wholeheartedly agree with this. I expected to see the parent comment, but I'm really sad to see it at the top. Of course that's not the commenters fault per se; it's clearly a very common opinion people are happy/eager to communicate rather than be ashamed of (even if so mildly as to prefer not to have it in their upvote collection).
There's a huge amount of technical jargon and sarcasm that makes it hard to see her point.
Basically she's saying that python async (which the current state of the art implementation uses libuv the same thing driving nodejs and consequently suffers from the same "problems") doesn't have actual concurrency. Computations block and concurrency only happens under a very specific case: IO. One computation can happen at a time with several IO calls in flight and context switching can only happen when an IO call in the computation occurs.
She fails to see why this is good:
Python async and nodejs do not need concurrency primitives like locks. You cannot have a deadlock happen under this model period. (note I'm not talking about python threading, I'm talking about async/await)
This pattern was designed for simple pipeline programming for webapps where the webapp just does some minor translations and authentication then offloads the actual processing to an external computation engine (usually known as a database). This is where the real processing meat happens but most programmers just deal with this stuff through an API (usually called SQL). It's good to not have to deal with locks, mutexes, deadlocks and race conditions in the webapp. This is a huge benefit in terms of managing complexity which she completely discounts.
> Python async and nodejs do not need concurrency primitives like locks. You cannot have a deadlock happen under this model period.
This is dangerously wrong and I would suggest that you reconsider the steps that got you to this understanding because something really important has been lost. It is absolutely critical to understand that deadlocks are not why you have locks. Correctness during concurrent operation is why you have locks. Deadlocks are a failure state when you do not have correctness during concurrent operation. So are things like double-increment and double-create.
Parallelism does not imply deadlocking, concurrency implies deadlocking, and both NodeJS and Python are concurrent runtime environments. And I can guarantee you that, with a little skull sweat, you can write a deadlock in NodeJS or Python. It is very easy. If you need some help, here's a trivial example (and ordinarily I wouldn't use a link shortener here but this one is hefty, it just goes to the Typescript playground): https://bit.ly/2Tvjyze
Also, as a concrete, real-world, yes-it-happens-here example of where locking is important, consider that I've recently built a dependency injection framework in NodeJS--tried to use others' first, but my situation isn't covered by existing ones--and had to resort to a mutex to avoid double creation of objects within a single lifecycle. Creation of objects within this lifecycle happens asynchronously--it has to, as the act of creating the objects might itself rely on asynchronous operations. So, if I were to have a diamond-dependency (A deps B and C, B and C dep D), I will non-deterministically, and based on the creation time of B and C, create either one or two instances of D. I rely upon a mutex, keyed upon the dependency being created, to ensure that this does not happen.
.
I would also submit that perhaps you should adopt a principle of charity and think real hard about whether your priors are correct before you start talking about what she "fails" to see. Rachel is one of those people who has Been Around and while I also have Been Around, I understand that Rachel has Been Around More and I probably should be listening more than I should be smarming at her.
>Parallelism does not imply deadlocking, concurrency implies deadlocking, and both NodeJS and Python are concurrent runtime environments. And I can guarantee you that, with a little skull sweat, you can write a deadlock in NodeJS or Python. It is very easy. If you need some help, here's a trivial example: https://codesandbox.io/s/2wxvp
Ok technically you're right. I am completely wrong when I say it can NEVER happen.
But let's be real here, you introduced DEADLOCKING deliberately by introducing LOCKS and by doing context switching at weird places to make it happen. When nodejs came out one of the selling points was the lack of deadlocks and locks.
Case in point: there are no lock libraries in standard NodeJS.
Think about it, why is a LOCK needed here? Let's say you didn't have locks AT all. Wherever the heck you are the current Node task technically has what is equivalent to a LOCK on everything. Why? because all node instructions are atomic and single threaded. This is what replaces LOCKS in nodejs. Your code example is just strange. The only place where your example is relevant is if there was another process.
>But as a concrete example of where locking is important, consider that I've recently built a dependency injection framework in NodeJS--tried to use others' first, but my situation isn't covered by existing ones--
Probably because, again, nobody really programs using DP in node let alone context switching and adding made up locks in the middle of all these injections and constructions. Whatever you're doing is probably very unique or (maybe, I don't know your specific situation) a sign over engineering. DP is a very bad pattern and is one of the primary sources of technical debt in code (especially when it's over 2 layers deep and in a diamond configuration)... but that's another topic. Anyway...
What is the point of "acquiring" a lock if in node I have a "LOCK" on everything? It makes no sense, whatever it is you're doing I am almost positive that there is a simpler way of doing it. Either way the dependency chain makes it obvious which needs to be created first and what can be created concurrently. The below psuedo code should produce what you're looking for without locks and with equivalent concurrency which is one of the main selling points of single threaded async.
b, c = await runAsync([B, () => C(await D())])
a = await A(b, c)
B is evaluated with D async, C is kickstarted after D, with B still being evaluated async. All of this blocks until both B and C are complete then A evaluates. Whatever the heck you're doing with locks, things should happen in the same order as the dependency chain in both my code and one with locks. There's really no other order these things can be evaluated. I would even argue that my code is indeed the canonical way to handle your diamond problem in node, no lock code needed as expressed by the standard node library.
Think about it, node includes high level functions for http but none for locks which are an even lower level concept than http. It must mean you aren't supposed to use locks in Node.
I will say you are technically right in the fact that a deadlock CAN happen. I was wrong in saying it can NEVER happen, but you have to realize that I have a point here. Your example is really going very very far out of the way to pull it off.
>I would also submit that perhaps you should adopt a principle of charity and think real hard about whether your priors are correct before you start talking about what she "fails" to see. Rachel is one of those people who has Been Around and while I also have Been Around, I understand that Rachel has Been Around More and I probably should be listening more than I should be smarming at her.
I was not smarming her, whatever smarming means, I am disagreeing with her just like you are disagreeing with me. There is NOTHING wrong with disagreeing with anybody. What is wrong is when you are proven wrong and you don't accept it. I accept that my statement of a deadlock NEVER happening in nodejs is categorically wrong.
"Being around" does not entitle you to anything. I hate it when people say this, nothing personal. Do you even know how long I've been around? Additionally, the overall main point of my post still stands, which you didn't even really address. I don't think Rachel gets the point of green threads. I think we can both agree I've made a strong point and maybe you should use your own charity principles on me.
NodeJS also doesn’t have a function to convert camel-case to PascalCase, should you not do that too because it’s not in the stdlib?
I’m going to be honest: you have not only not made a good point, you've gone out of your way to actively ignore that problems around concurrency regularly require one to use locks even in the absence of parallelism and have since long before multicore computers, and you're being weirdly hot-under-the-T-shirt besides.
Yeah well your post was rude and condescending. What did you expect with that attitude? Sure I'm angry, but there's nothing "weird" in my reaction given your rudeness.
>NodeJS also doesn’t have a function to convert camel-case to PascalCase, should you not do that too because it’s not in the stdlib?
This is entirely different from a Highly concurrent framework not containing lock primitives. A critical primitive is missing. It's like a math library missing the addition operator.
>you've actively ignored that problems around concurrency regularly require one to use locks even in the absence of parallelism and have since long before multicore computers
We're not talking about multicore/singlecore stuff. We're talking about NodeJS and Python Async Await and standard usage patterns.
There are other patterns that need locks but those are typically reserved for programming things like databases... something that a typical web programmer who writes NodeJS or Python doesn't deal with as web servers follow a stateless pattern that considers the usage of global state as bad practice.
> We're not talking about multicore/singlecore stuff.
If you write a python or nodejs handler, stateless or not, that does two subsequent async operations involving changes on shared resources, such as a database table, you need locks, because another request may come in while the first is in wait.
Perhaps you try to say that this is irrelevant when you allow only one request at a time, but that's extremely limited and not the scenario under discussion.
Nah I'm saying in the web application itself you shouldn't handle it. Python async/await or NodeJS.
You have to deal with deadlocks and race condition stuff in operations to the DB or any shared muteable state. That is obvious, but that is an external issue because web developers deal with shared muteable state as an external service that lives outside of python or nodejs code.
I mean if you count the transaction string or orm as part of dealing with locks within your web framework, then sure, I guess you're right? The locks on the DB are DB primitives though and not part of the web framework of language so I would argue that it's different. I guess reordering updates to happen on primitives in the same order could count as a web application change, but that's kind of weak as you're not really addressing the point:
NodeJS doesn't have locks, because you don't need locks in NodeJS and a deadlock should not occur unless you deliberately induce it.
The overall argument is the framework of NodeJS and python async/await is not designed for that kind of shared muteable state hence the lack of locks in nodejs std.
Also, never did I say a web developer doesn't need to understand or deal with concurrency. This is not true and I never claimed otherwise.
Additionally thank you for being respectful. (Take note
eropple)
This is a really important point. Exporting your locks to Postgres mean neither that they stop existing nor that you can’t wedge if it’s not written by clever programmers who better understand concurrency.
Yes it is an important point. Postgresql does not stop deadlocks or race conditions from happening. You deal with those in Postgresql.
But this isn't the topic of the conversation is it? The topic is locks and deadlocks in Python asyncio and NodeJS so ultimately irrelevant to your initial example of the amateur diamond dependency injection where normally no deadlock should be occuring regardless.
[I edited my post before his reply. Sorry for the confusion.]
Exporting your race conditions and washing your hands of them because the lock mechanism lives on the other end of a network socket rather than in your process space does not even rise to the level of “mere semantics”.
If you allow two NodeJS fibers to acquire remote locks—Redis redlocks, whatever—out of order by way of making asynchronous requests to it (noted only because you have a curious grip on that as being distinctive or meaningful here), you’ve still deadlocked and it is for all meaningful distinctions a deadlock of your processes (N >= 1). I state this only for completeness; there is no magic border at the edge of your process in which no, no, locks do not happen here. Locks control concurrency. When the problem set requires them, you use them.
I do not understand the spam of capital letters or the weird aggression. It’s like arguing about the coefficient of friction. The thing speaks for itself.
>I genuinely do not understand the spam of capital letters or the weird aggression about a trivial reality.
Then you better get with the program. Talking to people like the way you did won't make you any friends and will gain you many enemies. Don't worry though, I'm not that pissed off, just slightly miffed at your attitude. Also I like to use capital letters for emphasis. I guess you had a problem with that and decided to make it personal. Just a tip: don't act this way in real life, when you're older you'll understand.
>If you use Redlock to make a distributed lock over a Redis cluster and you allow two NodeJS fibers to acquire resources locks out of order by way of making asynchronous requests to it (noted only because you have a curious grip on that as being distinctive or meaningful here), you’ve still deadlocked and it is for all meaningful distinctions a deadlock of your processes (N >= 1).
Yeah because you're replacing your boolean in the earlier example with an isomorphic value. Either use a global js variable or a global value from redis. Same story. Nothing has changed from the locks you invented earlier.
Let me repeat my point. You shouldn't ever need to do the above in NodeJS because the area where the asyncio in python and NodeJS operate in are stateless web applications. That's why NodeJS doesn't have locks. You have to go out of your way to make it happen.
>There is no magic border at the edge of your process in which no, no, locks do not happen here. Locks control concurrency. When the problem set requires them, you use them.
And your point is? I don't understand your point. Clearly nothing I said was to the contrary.
Let's say your problem set is writing a database. Then locks makes sense. Does NodeJS make sense for this problem set? No. Do Locks make sense for NodeJS? No. Global mutable state is offloaded to external services and that is where the locks live. This is the trivial reality.
Let's stay on topic with reality. In what universe does your diamond dependency need a dependency injection framework with locks in nodejs? If you need locks your fibers are sharing global state and you've built it wrong.
b, c = await runAsync([B, () => C(await D())])
a = await A(b, c)
That you conflate “async IO” and promises may be why you’re in this hole in the first place.
Async IO uses promises to abstract out it’s select (or moral equivalent) but promises are not async IO. A weirdly prescriptive attempt at dictating what these are “for” don't do much to obscure the thing—they speak for themselves.
I’m still really confused why a callback-hell topographical sort and process would be somehow better than a cache, a lock, and a breadth-first search—not least because it’s easier to follow and is also, anecdotally, faster—but clearly these mysteries are just plain beyond my pay grade.
>That you conflate “async IO” and promises may be why you’re in this hole in the first place.
It's all just single threaded cooperative concurrency with context switching at IO. The isomorphic apis on top of this whether it's callbacks, async/await or promises is irrelevant to the topic at hand.
>I’m still really confused why a callback-hell topographical sort and process would be somehow better than a cache, a lock, and a breadth-first search—not least because it’s easier to follow and is also, anecdotally, faster—but clearly these mysteries are just plain beyond my pay grade.
I'm confused as to what the hell you're talking about. "callback-hell topographical sort and process" Wtf is that? Where were callbacks used in my example? Where was sort used?
Do you not understand that the dependencies determine the order of construction? That's it, it doesn't matter what technique you use the overall steps are the same. There is no bfs or callback hell going on. You manually instantiate the dependencies and choose what's async and what is sync. No need for locks.
Are you talking about something that takes a dependency graph and constructs the instance from that? If you want to do that your algorithm is incorrect. You need Post Order DFS, BFS won't work, but both BFS and DFS are O(N) so in terms of traversal over dependencies it's all the same.
class Node:
def __init__(self, createAnObject: AwaitableFunction[Any...], dependencies: List[Node])
self.deps = dependencies
self.constructor = createAnObject
async def constructObjectFromDependencyTree(root: Node) -> Any:
if root is None:
return None
else:
instantiatedDeps = await runAsync([constructObjectFromDependencyTree(node) for node in root.dependencies])
return await root.constructor(*[i for i in instantiatedDeps if i is not None])
The algorithm is bounded by O(N) where N is the amount of total dependencies.
If you want to construct an object with a total of N dependencies then no matter how you do it, the operation will ALSO be bounded by O(N). In terms of speed, it's all the same, but the above is how you're suppose to do it.
The above algorithm should give you what you want while providing concurrency and sequential execution exactly where needed. No callback hell, no promises, no sorting, no external shared state and no locks.
Regardless, if you're building Objects that necessitate such algorithms you are creating technical debt by creating things with long chains of dependencies. You should not be using your primitives to create large dependency trees; instead you should be composing your primitives into pipelines.
Additionally, relegating so much complexity to runtime is a code smell. If there aren't too many permutations bring it down to a manual construction with your code rather than an algorithm/framework.
Man, don’t do this here. You’re angry because I provided cites. You’re angry because you were overconfident, sweepingly general, wrong, and (worse) trivially proven wrong so you’re trying to well-actually out of it by being angry. I hasten to note that I am not being judgmental; I have been there. The best thing you can do is take the L and learn from it, dude. Everybody goofs sometimes, but you recall the best advice to take when you find yourself in a hole, yeah?
And...“typical web programmers”? I don’t know how relevant that is except in the light that typical web programmers use the tools built by folks who understand concurrency we’ll enough to build abstractions that make “you don’t need to think about concurrency“ mostly safe even though they’re completely wrong. Somebody’s gotta be doing that for you.
>Man, don’t do this here. You’re angry because I provided cites.
No I'm angry because of your attitude. What the hell is a cite?
I'll quote the drivel coming out of your mouth. Most of it is personal and doesn't even refer to the topic at hand:
>Quit while you're behind, my dude.
>This is dangerously wrong and I would suggest that you reconsider the steps that got you to this understanding because something really important has been lost.
>I would also submit that perhaps you should adopt a principle of charity and think real hard about whether your priors are correct before you start talking about what she "fails" to see. Rachel is one of those people who has Been Around and while I also have Been Around, I understand that Rachel has Been Around More and I probably should be listening more than I should be smarming at her.
>You’re angry because you were overconfident, sweepingly general, wrong, and (worse) trivially proven wrong so you’re trying to well-actually out of it by being angry.
>I hasten to note that I am not being judgmental; I have been there. The best thing you can do is take the L and learn from it, dude. Everybody goofs sometimes, but you recall the best advice to take when you find yourself in a hole, yeah?
Nothing I quoted was evidence that I'm wrong, but everything I quoted had a condescending attitude and very personal. This is not how you engage in respectful conversation. It wasn't my plan to "do this here." I don't really give a shit, I'm just saying you mouth off with that garbage of course the person on the other side is going to be a little pissed off. What the hell did you expect? My reaction isn't "weird" like you said earlier, it's normal to someone who is Rude. This was the small point I was trying to make which you expanded on with a very personal remark.
That being said I'm not that angry, just a little, this is the internet after all. I don't care.
Also did you not see my first post? Did you not see me admit to being wrong on something? I have no issue with doing that. This is not a problem for me. Ever. If I'm still arguing with you it means I disagree with you. No actually scratch that, a better way of saying it to you specifically is that it's not that I disagree with you, it's that you're flat out completely wrong and I'm right. See that? Same tact you have.
>And...“typical web programmers”? I don’t know how relevant that is except in the light that typical web programmers use the tools built by folks who understand concurrency we’ll enough to build abstractions that make “you don’t need to think about concurrency“ mostly safe even though they’re completely wrong. Somebody’s gotta be doing that for you.
web programmers need to understand concurrency. In external services like databases you still need to deal with locks, race conditions and deadlocks. These don't disappear, I never said that. The topic is Python and NodeJS and async await and that is the context I am referring to... please stay on topic.
I never said “you don’t need to think about concurrency“ <--- this right here was made up and a total lie.
The rest of that paragraph is incoherent. Somebody is doing <what> for me?
On a side note, you haven't given me any concrete examples of when you need to realistically use your made up locks in nodejs. Your last diamond dependency scenario and code sample made no sense from a practical standpoint.
It depends on what kind of performance you need. For CPU intensive tasks I would agree with you. But for network I/O intensive tasks, even though Python is slow it's still more than fast enough to keep up with a large request volume since network I/O latency is so much longer than CPU/memory latency.
> It's around this time that you discover that people have been doing naughty, nasty things, like causing work to occur at "import time".
Is this something people actually have problems with in practice? I did lots of python and ran into it once. It was quickly fixed after a raised issue. I feel like non-toy development just doesn't experience it.
But maybe that's my environment bubble only. Do people who do serious python development actually have problem with this?
Python pretends to be a nice homogeneous "everything is at run time" language, but it is all a big lie and there aren't big flashing letters saying "you really probably shouldn't do this" when you start solving a problem in a certain way. For example, it is almost certainly best practice to _never_ call a function, class method, or static method inside a module that is going to be imported, and certainly never instantiate a class. However, there are certain patterns that almost necessitate it if you don't want to write loads of boiler plate or deal with the performance overhead of metaclasses. There are also a bunch of nice hacks like using `object()` at the top level as an instance distinct from everything else, but I'm sure there is a way that `MYTYPE = object()` will come back to absolutely ruin your day if you have to compare two `MYTYPE` instances in two different dicts derived from a parent process and a subprocess.
I have personally made this mistake on two or three occasions where I conflated file/module behavior with class behavior because I wanted a python file to act like it was a bit more declarative. Unfortunately this leads to a world of eternal pain. You can work around it, but you should have made everything a python class and pretended like the files/modules don't exist or at least have staggeringly different semantics hiding behind that innocent little `.` operator. Python simply cannot support the desire to solve a problem in a certain way because of the structure of the problem and forces you into using its happy path patterns if you want it to work in slightly different run time contexts.
Two relevant posts from instagram engineering on the subject which suggest that best practices for avoiding these kinds of issues are non-obvious and easy to miss.
> but I'm sure there is a way that `MYTYPE = object()` will come back to absolutely ruin your day if you have to compare two `MYTYPE` instances in two different dicts derived from a parent process and a subprocess.
I don't see how this can be an issue with imports. You have two cases: you imported the module with MTYPE before or after fork. Before: they will compare fine. (unless IPC's involved) After: you transferred the dict with MTYPE through some kind of serialisation or shmem and you cannot compare identities - that's a property of IPC rather than something to do with python modules.
Anyway, what I meant in previous comment was the risk matrix view. Sure, this can lead to bugs scoring various points in severity, but does it score high on likelihood?
Somewhat related to the RPC argument, but HTTP is a total joke, adn therefore so is REST.
In adtech you send 204 responses a lot. The body is empty, just the headers. Headers like 'Server' and 'Date'. Apache won't let you turn Server off... 'security through obscurity' or some nonsense. Why do I need to tell an upstream server my time 50k times per second?
Zip it all up! Nope, that only applies to the body which is already empty.
Egressing traffic! A cloud provider's dream. I wonder what percentage of their revenue come from clients sending the Date header.
Rachel says that a thread is only ever doing one thing at a time - it is handling one request, not many. But that's only true when you do CPU bound work. There is no way to write code using blocking IO-style code without using some form of event loop (gevent, async/await). You cannot spin up 100K native threads to handle 100K requests that are IO bound (which is very common in a microservice architecture, since requests will very quickly block on requests to other services). Or well, you can, but the native thread context switch overhead is very quickly going to grind the machine to a halt as you grow.
I'm a big fan of gevent, and while it does have these shortcomings - they are there because it's all on top of Python, a language which started out with the classic async model (native threads), rather than this model.
Golang, on the other hand, doesn't suffer from them as it was designed from the get-go with this threading model in mind. So it allows you to write blocking style code and get the benefits of an event loop (you never have to think about whether you need to await this operation). And on the other hand, goroutines can be preempted if they spend too long doing CPU work, just like normal threads.