Hacker News new | past | comments | ask | show | jobs | submit login
Goroutines are not significantly lighter than threads (matklad.github.io)
120 points by todsacerdoti on March 12, 2021 | hide | past | favorite | 112 comments



Not only does 3x seem quite significant, but thread context switches are also a large overhead and that goes untested here. If thread context switch overhead was as low as usermode context switching, there would be no use for coroutines since you could just use threads instead; I doubt it’s non-trivial.

(Of course, in Go, the scheduler also weaves in the GC IIRC, so an apples-apples comparison may be difficult. Micro benchmarks are just not that useful.)

P.S.: this article seems to work under the assumption that 10,000 Goroutines is a reasonable upper limit, or at least it feels as though it implies that. However, you can definitely run apps with 100,000 or even 1,000,000.


The performance of the Go runtime with a million blocked goroutines is pretty OK, but its performance with even 1000 runnable goroutines is not great at all. You really need to think about which you are going to have.


Is 1k runnable goroutines materially worse than 1k runnable threads? (Honest question, I don't know myself).


True, but if you regularly have 1000 runnable goroutines then could you not reconfigure your app to have, say, 64 runnable goroutines, and get better throughput? Large numbers of goroutines do seem to be a good fit for problems which are mostly waiting on the network.


The architecture of Go forces you to have 1 goroutine servicing every socket, so the number of runnable goroutines will then be at the mercy of your packet inter-arrival process.


Go does not forces you to do any of that: https://github.com/panjf2000/gnet ( and as you can see it's as fast as the fastest C++ lib )

Go provides go routines as a building block, it's up to you to use that or use something else ( reactor pattern, epoll ect ... )


Aren’t sockets the textbook example of what’s usually blocked? When do you have 1000 sockets all ready to read?


Systems often solicit these kinds of packet storms, for example in root-and-leaf search architectures. See "TCP incast" for writings about this problem.

https://www.usenix.org/system/files/login/articles/chen12-06...


I'm not too familiar with go, but could you make the socket-routines very lightweight? Like dump the work on a threadpool, and then go back to sleep?


Does it? You could implement a classic select-based reader and writer in a single goroutine, to handle many sockets at once.


Please describe how you would actually do that in Go.


Using epoll directly (via syscalls) like here https://github.com/tidwall/evio


My information is outdated, but last I knew, Go always parked a goroutine when making a syscall. Seems like a lot of overhead to read bytes that you know are there from an epolled fd.


You could put epoll fd in non-blocking mode, wrap it with os.NewFile, use SyscallConn() to get syscall.RawConn object, and then use its Read method. Its Read method is special: you can return "not ready", and it will use the Go runtime poller to wait until it's readable, in this particular case effectively putting epoll in a epoll.

In epoll case, using RawConn.Read would look like this:

    err := rawConn.Read(func(fd uintptr) bool {
        nevents, err = syscall.EpollWait(int(fd), events[:], 0)
        if nevents == 0 {
            return false // try again
        }
        return true
    })
Note that using RawConn.Read here is only necessary because epoll needs epoll_wait(2) instead of typical read(2). For ordinary file descriptors, like pipes, etc., setting them to non-blocking mode, wrapping them with os.NewFile, and using its ordinary Read/Write methods is sufficient.


Not exactly. It depends on the syscall. Syscalls that are known to be slow are treated the way you're describing, but syscalls that are often fast are treated optimistically in a way that allows the work queue to be stolen and spawned onto a new thread only if the syscall doesn't return quickly.

I don't know which category epoll fits into, but it seems like it should be treated optimistically, since epoll is used for non-blocking I/O.

https://utcc.utoronto.ca/~cks/space/blog/programming/GoSched...


read(2) has to fit into the "might block" bucket, though, right? It would take a very smart runtime to figure out you just called epoll recently and established it won't block.


> If thread context switch overhead was as low as usermode context switching, there would be no use for coroutines since you could just use threads instead

Two points: a) native threads switching overhead is extremely low, probably lower than usermode Go thread overhead; b) the utility of usermode threads isn't to lower overhead.

It's a given that usermode threads introduce lots of overhead, if only because you're reimplementing the scheduling machinery that already exists in the kernel and that you're already paying the performance penalty for.

But theoretically, you could implement specially tailored scheduling algorithms specific for your app and gain an advantage there - kernel schedulers are general-purpose and can introduce latencies, starvation and other undesired special effects.

P.S. The Linux kernel also has interfaces for changing the thread scheduler algorithms. This gets you 99% of the way there - native threads + round-robin realtime thread scheduler gets you the performance benefits you want without having to rewrite schedulers and context switchers in usermode for every language runtime.


"native threads switching overhead is extremely low, probably lower than usermode Go thread overhead"

This is most likely very wrong.


When you think about in depth, the kernel can context switch between threads essentially for free when returning from a syscall (the work to restore a userland context is the same regardless of which context it is switching to, disregarding cache effects).

In most apps, context switches would happen only due to kernel->user transitions anyway (such as an I/O syscall finishing or a timer triggering), so in most apps kernel context switches are indeed almost free.

To be significantly more efficient than the kernel there should be much more context switches than syscalls, which for non-synthetic programs is mostly only possible under heavy loads that can batch, with suitable I/O interfaces such as io_uring.


It isn't. Kernel context switching is the same thing as language runtime context switching, except you get access to kernel mode and you don't have to keep track of garbage collection stuff.


I thought the major cost was cold cache after switch in both cases?

Kernel context switches are pretty light compared to the following slowdowns due to cache.



I thought this article would talk about overcommit and demand paging, but it just counted the RSS between two processes and called it a day.

They don't even attempt to distinguish the size of the stack from the rest of the runtime.


With MADV_FREE behaviour, RSS isn't even a useful measurement.


I think there's a bug, tho it might not make a difference to the results:

    for i := 0; i < 10; i++ {
       go func() {
          f(i) // sees whatever value i has when f() is called, usually 10
       }()
    }


This is a very common newbie mistake. Fix by passing in i.

    for i := 0; i < 10; i++ {
       go func(i int) {
          f(i) // sees whatever value i has when f() is called, usually 10
       }(i)
    }


absolutely wild to me that Real Engineers (TM) complained about this in JavaScript for years only for it to be called a "newbie mistake" in Go.


The same is true of most of JavaScript's warts


This looks very much like the mistake in javascript with callbacks in for loops.


Note that using `let` instead of `var` in your for-loop fixes this in JavaScript (ignoring IE's faulty implemention of `let`).


For the curious reader: this can be solved by simply passing i as a parameter.

    for i := 0; i < 10; i++ {
        go func(i int) {
            f(i)
        }(i)
    }


Or commonly, tho perhaps confusingly:

    for i := 0; i < 10 i++ {
       i := i // new variable i for every iteration
       go func() { f(i) }()
    }


Yea. I do this personally - it's easy copypasta since it infers types for you, and as it shadows the iteration var it's impossible to use the wrong one. Same trick works for other local vars e.g. outside the loop, though at that point it's just as not-copypasta-able as the `func(arg){}(arg)` construct.


Thanks for the correction, that is indeed a bug!


Interesting. Integers are captured by reference in Go?


Closures capture all in-scope variables by reference in most languages.

Same situation in Python for example: https://stackoverflow.com/questions/54288926/python-loops-an...


You can get bitten by a similar problem without closures if you stick a loop variable in an array. The loop variable is one address with contents mutating each iteration.


> You can get bitten by a similar problem without closures if you stick a loop variable in an array.

Surely only if you store it indirectly, otherwise it'll be copied into the array.


This has been known since the days of NPTL and the O(1) scheduler. I guess it has to be relearned every decade or so.


Known but not well-known. There are many false beliefs out there. One of the most durable is the incorrect belief that a thread incurs the maximum stack size on creation, which isn't true on Linux.


Too much ego is at stake on wanting to write cool shiny new async language runtimes and not wanting to learn how your existing kernel API's work.


3x seems significant?


Not only that, but the author takes a single data point and attempts to draw a line.

Without measuring 10k, 20k, 30k of each and seeing how the memory usage changes, we are seriously comparing apples to oranges here.

Go's allocator will hold onto a decent chunk of memory just to amortize the cost of reaching out to the operating system. Just asking the operating system how much memory the Go application is using doesn't tell the whole story in this benchmark.

Is it 3x or 30x? The author certainly doesn't know based on taking a single data point from each application.

Goroutines are more memory efficient than full OS threads, but they're also cheaper to spawn and the Go runtime can optimize for certain common use cases very effectively.

I've also personally had serious difficulty in the past with getting Linux distros to let me launch tens of thousands of OS threads. Long before I run out of memory, I hit various limits that block the program from spawning additional threads. Some of them can be adjusted, but I never managed to spawn threads arbitrarily up to the amount of memory that was available... probably just my own failing, but that alone is reason enough not to spawn an unbounded number of OS threads.


There was a related post about a month ago[1].

I believe the article fails to account for, among other things, the virtual memory overhead that comes with sparse allocations for large maximum stack sizes, that would about double the numbers. With 2MB or so stack spacing, if you only use a single page of RAM per thread, you still need another whole page of RAM in the "page table" (actually a trie) and on linux those are not counted in the process RES memory usage.

That being said, creating thousands of native threads has lots of other pitfalls, including stack ones - would not recommend as a general strategy.

If haven't tried creating actual threads, but for test mapping stack-like staggered memory, I've had success with:

  sysctl vm.overcommit_memory=1
  sysctl vm.max_map_count=10000000
  swapoff -a
Remember to keep an eye on free memory, not process RES or similar, that doesn't count page tables and other overhead.

[1] https://news.ycombinator.com/item?id=25997506


Thanks, RSS not counting memory for page tables themselves is a good point, I will correct that in the article tomorrow.


Note that page tables are the per-process virtual memory mapping data structure for the CPU. AFAICT, Linux has separate accounting of the memory, which is smaller, but also per-thread structures, which are likely not included in RSS either.


It does, the article and results contradict the title. I'm not sure that "lightweight" purely can be capture by memory usage either.


To argue in favour of the article: if you do real work in the coroutine / thread, there is going to be much more data in the work context than the stack and the memory usage difference is likely to be negligible.


No, the article correctly points out:

> A thread is only 3 times as large as a goroutine. Absolute numbers are also significant: 10k threads require only 100 megabytes of overhead. If the application does 10k concurrent things, 100mb might be negligible.

If it's still not clear: if your application has a good reason to run 10k parallel tasks it's most likely doing something complex that requires plenty of RAM.

It's very unlikely that you really have to save those 100MB of RAM and at the same time you cannot rethink the architecture to stop using this level of parallelism.

And even so, that would justify using a coroutine library, not a whole programming language.


> The most commonly cited drawback of OS-level threads is that they use a lot of RAM.

"RAM" is inaccurate in that sentence.

It's important to realize that the common knowledge "you can't have 10k threads" comes from 2 decades ago when we were running 32-bit CPUs. A thread may only need a couple pages of RAM; but it needs a whopping 1 MB (or more) of virtual address space (at least with default stack sizes). You simply can't start 10k threads in a 32-bit process, as you'd need 10 GB of address space. OS-level threads use a lot of address space and thus were unsuitable for the C10k problem back in the day. But in 64-bit world, 10 GB is nothing, the address space usage can almost be ignored and the RAM usage is much less bad. (but: RAM usage of page table entries also needs to be considered. the article forgot about this)

Of course, just because it works now doesn't mean that 10k threads are a good idea.


Nice elaboration!


Good start but very simplistic

What happens when these threads or goroutines start doing real work (CPU, IO) concurrently ?


I’d be curious to know how C++20’s stackless coroutines compare on memory usage.


C++ stackless co-routines are pretty lightweight.

A few years back, when I was writing something for a presentation, I spun up 50 million coroutines, and arranged them in a ring with channels and passed a value all the way from each co-routine to the other, on Windows Laptop with 8GB of RAM.

I don't recall the exact amount of memory used, but my laptop handled it really well.


That's stunning, if (honest if, I really don't know the answer) they consume 4K memory each, that would be 50M * 4K = 200G.


They just use what you need for the variables. In this case, I didn't have anything heavy weight, so maybe only a few tens of bytes per co-routine.


> A thread is only 3 times (memory consuming) as large as a goroutine.

Now, the official implementation allocates a 2k-byte stack for each new goroutine. The initial stack could be much smaller in theory.


Asides from significant memory use from uncounted overhead for native threads, fun starts when your threads start actually using more than a single page of stack, calling into who-knows-what, instead of just sleeping.

Typical programs will transparently grow stack on demand, both with native threads (memory not mapped until used) and goroutines / green threads.

Unlike green threads of typical runtimes, I don't know of native threading that releases no longer used stack space, until .. I guess the thread is destroyed, so sporadic large stack use on can blow up your memory.


> Unlike green threads of typical runtimes, I don't know of native threading that releases no longer used stack space, until .. I guess the thread is destroyed, so sporadic large stack use on can blow up your memory.

They don't. Although if you know that may be a concern for your application you can easily enough `madvise` after calls which may map significant amounts of stack, in order to release it.


While possible, sprinkling code with madvise syscalls to release memory doesn't seems like a very realistic proposition. Information on how much stack is actually mapped is not readily (or cheaply) available.

The article says:

> The workload is representative for measuring absolute memory overhead, but is not representative for time overhead.

The workload is definitely not representative of real memory overhead. What's worse, it almost can be, but it is very dependent on the pattern of stack usage and it may not be obvious that you are holding on to large amounts of stack.

Default stack size on linux is 8MB. If you don't use much stack, 10k threads might only need 200M of RAM. Or your threads might touch 2M of stack early in their lifetime, parsing recursive data structures or whatever, and you need 20G of RAM, because you're stacks don't get freed.

Realistically, when using green threading, I wouldn't give much thought before using 10K waiting threads.

With native threading, I would look for other reasonable solutions, because it's not worth the headaches, but in case they do turn out to be the most reasonable solution, I would be sure to appropriately lower the stack size to avoid OOM, budget at least a week for solving gotchas, and forever need to be vigilant about stack usage.


Under Linux memory is allocated on first access to a mapped page, so actually no memory is being allocated in both cases.


Yeah, but its significantly cheaper to switch between goroutines than is threads.


Yeah that's what I was thinking as well. The article leads with "The most commonly cited drawback of OS-level threads is that they use a lot of RAM." and that's not the impression I had at all.

I've always felt that the biggest reasons that threadless concurrency primitives were adopted was to avoid the non-negligible cost of spawning threads and context switching between them.


Even if Goroutines were significantly heavier than threads, I would prefer them for many reasons.


Such as?


Easier to reason about. easier to communicate. Threads are a low level primitive. the goroutines seemed to be designed with concurrency in mind.


It's the channels that make goroutines easier to reason about, not the green threads. There's plenty of channel implementations on native threads.


Example of c++ channel library that is easy to use? The only one I’ve seen was packaged with Google’s internal cpp fiber library that depended on custom linux kernel patches.


I haven't done much actor model programming in C++ in nonproprietary envs so I'm not sure there.

But Rust's mpsc package is a good public example of the concept off the top of my head running on native threads. They're really not that complicated and a c++ version would be on the order of a few hundred lines.

I had been using the concept since the 90s, as RTOSes really love it. Pretty much every N64 game uses native threads communicating via "software FIFOs".


Rust one you mention is multiple-producer single-consumer. And it is not nearly as feature rich as a channel in Go. By the time one deals with tokio, async/await, passing the (rx,tx) to the right thread with clone()/copy(), the developer would be tired. There is a huge difference in library implementations of channel vs native language support


You don't need any tokio or async/await to use mpsc. It's in the standard library and works just fine with a thread spawned with a closure to work on.

It's split into a read half and a write half so you don't have to worry about copy or clone, as an execution context will only have one or the other. I guess you clone the write half to give it to multiple producers, but that's not a huge deal.

The example here for instance doesn't look any more complicated than using go channels, and it's a full executable example you can start by clicking the run button.

https://doc.rust-lang.org/std/sync/mpsc/fn.channel.html


> Rust one you mention is multiple-producer single-consumer. And it is not nearly as feature rich as a channel in Go. By the time one deals with tokio, async/await, passing the (rx,tx) to the right thread with clone()/copy(), the developer would be tired.

tokio and async/await have little to do with channels (other than having their own async-aware versions).

If you need a real mpmc channel, crossbeam[0] provides one. The interface is similar to the stdlib's, but you can clone the receiver as well as the sender.

[0] https://docs.rs/crossbeam/0.8.0/crossbeam/channel/index.html


tokio use is often not by choice. web service might need hyper.rs, which uses tokio underneath. Put a crossbeam non-async call in between and one would bring concurrency to a stop.


An issue with doing that kind of programming in C++ is ownership. Basically if you have goroutines/threads and channels, then GC is helpful.

So the Go language is really giving you 2 things together that C++ doesn't have that makes concurrent programming easier. But arguably they could have just given you threads and channels and GC, and not goroutines. That is, leaving out M:N threading. (I thought gccgo did that a long time ago or still does?)

You can use threads and channels (which are just thread-safe queues) in C++, and follow a convention where you NULL out pointers after passing them across a channel. So basically you have single ownership.

That is all a manual process, which is OK sometimes, but when you add in threads and the possibility of nondeterministic bugs, then maybe it's easier to see why that style of programming is not super popular. Though I have done it for small programs and it works well.


I'm not sure a GC really helps that much. Yes, there's a few memory unsafety bugs from ownership issues fixed, but you still need to forcibly forget about any references you had to the data being passed along the channel so you don't access it concurrently (which is the case where go's memory safety guarantees start breaking down too, concurrent access to mutable pointers).


Yeah that's a fair point. Well then you don't need much to implement a thread-safe queue in C++, and that's a big step toward Go-style concurrency.

If you want to be really minimal you could just use pipe() with each end in a different thread but the same process, even passing pointers over it.

Probably the bigger deal is the consistent networking libraries, and the consistent mechanisms for timeouts and cancellation that Go offers. I think that's difficult in most C++ programs using threads. (I'd be interested to know what C++ programs do that well)

On a project several years back, that's what pushed me toward async -- doing timeouts in a principled way.

Though to be fair to Go, I would say that memory corruption bugs can be highly non-local (from symptom to code fix), and race conditions less so. Bugs that have both properties are really painful and I think Go mostly helps you avoid the former.


Go code can still have ownership issues around channel lifetime (panic on sendin to closed channel, etc)


I'd say that they are easier to write code and to reason about when compared with, say, POSIX threads.


How come? How are goroutines different from real threads such that they're easier to reason about?


In theory, they are equivalent. In practice, POSIX blocking IO does create problems around cancellation and selectability. For example, there’s no good way to cancel an outstanding blocking read call, and you might need to do weird hops to work around that.

If you do goroutines, you need to redo all IO yourself anyway, and that’s a good opportunity to implement things like universal support for cancellation and timeouts.


Faster context switches. Vastly larger recursion depth before stack overflow.


How often is recursion depth a constraint on your programming?


You're being a massive contrarian right now - asking others for supporting evidence while providing none to support your own position.


Believe it or not, it was a genuine question.


Right and there's nothing wrong with genuine questions - however each question is asking for the commenters time to satisfy your own curiosity and there is a threshold where asking a bunch of loosely defined open ended questions no longer contributes value to a conversation (because they become rhetorical, unanswerable, or the commenter has moved on and doesn't want to commit to a full answer), etc.

In short it gets to a point where it seems like you're only asking questions to prove someone else incorrect or even inferior - versus to actually further a conversation by contributing concrete evidence. This may not have been your intention, or I could be misinterpreting it, but in your shoes I would want someone to tell me they had this impression so I could consider it going forward.


> The most commonly cited drawback of OS-level threads is that they use a lot of RAM. This is not true on Linux.

I was under the impression this was context switching rather than memory usage, which I haven't seen much about when these models are compared.


Goroutines had excellent context switching back when Go preferred to run in a single thread. Version 1.3 or so?

Now that Go defaults to multiple threads it has lost all of that advantage. Goroutine switching and channel send/receive has to apply all of the locking that any multithreaded program has to use.


You have to use locking only with inferior frameworks, who never heard of lockless threading or non-blocking IO. Locking also makes concurrency not only slow but also unsafe.


How does Loom threads overhead compare to goroutines and Kotlin coroutines?


Well the costs are subtly different. The size of the virtual thread object is fairly small, but is not the only cost. Rather than allocating a stack for a virtual thread we freeze chunks of stack into special objects and thaw the back onto the stack of the OS thread (we can do this because we know there are no pointers into Java stack frames). This means that the cost can be very small, but you may place more of a load on the GC if a collection occurs between the freeze and the thaw which causes this stack object to be moved.

[edit] I should mention that there are other potential overheads if you use things like thread local variables as these may require more work from the GC to be collected. We are working on a new mechanism which will should be better in this regard and be a better API for many of he uses of thread locals.


>If the application does 10k concurrent things, 100mb might be negligible.

When you think about memory in quantities of 1,10,100,1000 million units, a few hundred MB makes a big difference (e.g. Android devices).


On an android device you don't need 10k coroutines.

And on severs - people implemented webservers like apache 25 years ago without coroutines and it ran with with very limited RAM.


Developers don't care about memory on consumer devices. Look at any modern website.


Hence the word “might”. If it is not negligible then one needs to opt for something async IO at the cost of more complexity.


For giggles, would've liked to have seen Erlang/Elixir in the mix.


Slightly heavier than goroutines: according to the efficiency guide[0] a process is 338 words (2704 bytes), when spawned.

This includes a 233 words (1864 bytes) private heap.

This likely wouldn't work for TFA (as it uses "active" threads) but it should go lower if you immediately hibernate the processes:

> The heap is then shrunken to the exact same size as the live data that it holds (even if that size is less than the minimum heap size for the process).

While I've not tested it, I assume a just-started process has nothing on the heap, so it'd go down to just the "overhead" 105 words (840 bytes).

Though that's at the cost of immediately allocating a new min-sized heap when the process wakes up.

[0] http://erlang.org/doc/efficiency_guide/advanced.html


And Haskell. Haskell green thread overhead is half of Erlang.


Haskell's concurrency blows Go's out the water.


there are cases were coroutines and threads can be used interchangeably.

But often times, they co-exist together to solve the problem. That's the case in Go. goroutines run on threads after all.

Op's could have used a better example like a web/tcp server handling 100k connections and showed the difference between coroutines alone, threads alone, and both combined.


Would a better comparison let the threads build up a few MBs of non-identical stacks?


I guess this one contrived benchmark proves it. Case closed.


It's curious how people associated with the Rust project will vocally complain about unfair/shallow criticism then turn around and post something like this.


Please don't post in the flamewar style to HN. It leads to flamewars, which are predictable, nasty, and dumb.

https://news.ycombinator.com/newsguidelines.html


Out of curiosity, how is this unfair/shallow? It's not obvious to me


Given the domains where Go is most likely to be used, any comparison of goroutines vs threads should include the netpoller vs blocking socket i/o.


Why would using threads imply using [thread-]blocking socket IO?

Most HLL runtimes (that you’d ever want to use for writing a server) have some async IO completion mechanism using scheduler-reserved “async IO waiter” threads, exactly akin to netpoll. (Or they plug into the OS in a less-portable way that avoids blocking IO syscalls to begin with, like Node’s libuv.)

Or, if the language’s runtime doesn’t do it for you, then the popular connection-pooling abstractions that the language’s server libraries are built on top of do it for you, by spawning their own AIO completion threads. Jetty’s worker pool in Java, Tokio in Rust, etc.

Honestly, who’s out there writing code in 2021 where there are worker POSIX threads, and those threads are (directly or indirectly) calling read(2)?


I do :)

I don’t write web services, so I have low O(1) things to communicate with, for which threads with blocking read work OK. They are not perfect, as there is no cancellation, but working around lack of cancellation in a couple of places is less costly than bringing dependency on a relatively fresh Rust async ecosystem.


Threads and blocking calls is the programming model exposed by Go and the fact that it can do this efficiently is the few notable accomplishments of an otherwise unremarkable language. Many people find threads and blocking calls to be much easier to reason about than the async model of, for example, libuv.


Well, I mean, you can eat your cake and have it too (i.e. have a language without the non-determinism of preemptively-scheduled green threads, while still having “blocking” call semantics.)

.NET Orleans, for example: technically cooperative-scheduled tasks/fibers, but where yield points (i.e. generated state-machine transition points) are decided at compile-time; and where the inability to place such a yield-point within a soft-real-time-bounded distance from the previous one is a compile-time error. You don’t have to place the yield points yourself (like explicit async code), but they’re right there to see when you debug or disassemble the program, and you can use pragmas to adjust where or if they’re placed.

Technically the Erlang runtime is this way as well (cooperatively-scheduled, with “explicit” yield points) but since one of those explicit yield-points is use of the CALL or RET ops—and since there are no looping ops, only tail-CALLs—it’s kind of hard to write an explicit atomic-CPU-blocking hard-realtime task-segment in Erlang. (You can, but you have to unroll all your loops. Which Erlang has no tooling support for. Much easier to just push the problem to C through a NIF. At least the Erlang scheduler has explicit design support for these long-running blocking NIF operations through “dirty schedulers”, so it’s not like you’re breaking the runtime’s operational tolerances by trying to do hard-realtime things in it.)


I don’t think every benchmark should be a benchmark of everything. Targeted micro benchmarks are valuable, if they are explicit about their applicability scope.

The post is very direct about looking only at the memory usage, and literally says that using the results to reason about overall performance would be wrong.

I see how the article could be read as “goroutines bad, threads good”, as it doesn’t go to extraordinary lengths to prevent that. But I prefer to cater to a careful reader, and not add loud disclaimers repeatedly.


Yes, I think I read a little too much into your piece. I had on my mind at the time a certain prominent former Rust contributor's repeated bashing of M:N threading as well as the silly firestorm that erupted from that (admittedly not great) recent blog post critical of async Rust.


[flagged]


Please don't post flamewar comments to HN. It's not what this site is for.

https://news.ycombinator.com/newsguidelines.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: