Hacker News new | past | comments | ask | show | jobs | submit login
Goroutines and APIs (gnoack.org)
106 points by gnoack on Nov 18, 2019 | hide | past | favorite | 31 comments



I think that it is universal advice for libraries: do not do multithreading inside, let the client code do it and/or pass necessary constructs.

Simple example for Java: do not create executorservice instances inside of a library, but allow client code to pass them.

Ayende wrote much better than me long time ago https://ayende.com/blog/159201/multi-threaded-design-guideli...


I think what Kotlin is doing with co-routines improves a bit on this. The approach there is to make asynchronous code look (almost) like normal code. The only difference is that 1) you are dealing with functions that are marked as suspend and 2) you can only call these from a co-routine scope.

The last bit is the crucial thing. The co-routine scope is how you structure concurrency and is something I have not really seen in other approaches. In a mobile application you have the main thread, IO threads, and maybe some CPU threads. When calling something that is asynchronous, you have to assign it a scope. When that scope ends (because of a timeout, error, etc.) there's a cleanup step that cancels all the dangling tasks.

The nice thing about this is that you get clean separation of how to structure your concurrency and the logic. They also made it very easy to integrate with existing asynchronous code (there are a lot of frameworks for this in Java).


Kotlin's approach is extremely nice, not perfect but nice, for example most of Kotlins collection operations are inline so functions like mapIndexed etc.. you can call a suspend function even though mapIndexed is agnostic (by nature of just a ~template) as long as your in a co-routine scope of course, I don't often call suspending operations in collection operations but the inline fun approach well in other areas (your own code). Other languages don't have this - Javascript for example map/filter/reduce won't await if you return a Promise. C# would need specialized Task aware version of the Linq stuff (they probably already exist).

The scope tree thing is really nice too once you get your head around it, much nicer than in c# with carrying cancellationtokens around and creating your own cancellationsource when you need something like a supervisor scope.


The blog seems sadly unable to tolerate the load. I don't see an archived version of the post anywhere either.

If I had to have a guess what it says, it's probably something along the lines of:

* Don't require users of your API to start goroutines to use your API correctly.

If Goroutines need to be started for your API to work, start them in an init function, on first use, or via some other mechanism.

* If possible, avoid high goroutine fanout in your implementation.

If a single request to your API fans out to 1k goroutines, it's going to be a problem for a high-traffic user of your API. Especially if there are other APIs that make the same design choice. As ever, try to be parsimonious with resource consumption.

IMO, these are two good principles to live by, and not just in Go.


(whoops, excuse the bad webserver config... should be fixed now)

> Don't require users of your API to start goroutines to use your API correctly.

Actually, what I postulate in the article is different... The theory is that making the caller kick off the goroutines puts more control over concurrency into the hands of the caller, and the resulting API is simpler.

> Avoid high goroutine fanout in your implementation.

That's for sure. :)


Doesn't the native net/http server spin up a goroutine for every request?

Edit: See line 2927 at https://golang.org/src/net/http/server.go


The net/http library is the caller in this case, so that would be consistent with the rule that callers should start goroutines.

Request handlers are a bit of a special case too, in that they are a framework for dispatching tasks to be worked on; what is main() for a command line program is the request handler for a webserver. It seems fair that there is some concurrency coordination happening at the top level.


Yeah, but it doesn't spin up a thousand for every request. And it doesn't require you to start them. It starts them under the covers as part of the API.


Right. So:

* Don't require users of your API to start [additional] goroutines to use your API correctly.


I guess what I mean to say is: don't make your users write

    go yourpackage.YourApiCall()
in order to use the API correctly. That's just a guess, though.


The article says almost exactly the opposite, and I agree.

Your API should present all of the things it does as synchronous functions/methods. If callers want to run them asynchronously, they can easily provide their own goroutine to do so, including whatever lifecycle management makes sense for them.

The concrete example was

    // Do this
    func (t *type) Run(ctx context.Context) { ... }

    // Not this    
    func (t *type) Start()  { ... }
    func (t *type) Cancel() { ... }
This is generally good advice which stops a whole class of extremely common and very tricky to debug orchestration bugs outright.


In general, decouple libraries from use case dependent assumptions. This holds true not just for concurrency, but for memory management, sockets, file streams etc. Don't assume that a per-object heap allocation is good enough for me. Don't assume that I want to serialize to a file system. Don't assume that a channel has a large enough buffer.


Some operations can use green threads, while others need kernel threads. Last I tried (~2015) Go would spawn unlimited kernel threads, so function callers had to know whether a call required a kernel thread or not, and rate-limit appropriately.

Is this still the case? How does modern Go handle operations which require kernel threads?


Yes, they still need unlimited kernel threads (for liveness reasons). When I asked a few years back about this, go-nuts@ suggested implemented my own rate limiting in application space.

In my case, it was for running Go on a resource constrained Raspberry-PI, where the kernel threads could easily live too long, and use up all memory. The threads were calling read(2) on a network mounted fuse fs, and would last for 30s+.


Which operations need kernel threads? There is runtime.LockOSThread if you need underlying thread affinity, but I'm not aware of any other mechanism to interact with the underlying threads of execution.


For example os.Stat()


Interesting, I've never encountered a situation where that kind of thing matters, out of curiosity what are your use cases?


I can't read the article but hopefully it does not ironically present advice on writing webservers.

There is really little excuse for a web server to buckle under pressure these days - hackernews sends a spike of traffic but not enough to kill even a moderately spec'ed server.


The Raspberry Pi handles the load well after fixing that configuration mistake. (I still need to investigate to understand in detail what went wrong there...)


Often these kinds of problems are caused by system misconfigurations, not web servers buckling.

(Source: Am web server developer)


It'd only be ironic if the web server hosting the blog was a Go process, and even then it's a leap to conclude irony because the fault may lie outside the server process itself such as with the hosting company.


The problem with goroutines is that an unhandled crash can cause a shutdown, and if the goroutine is started by a badly-written library with no proper recovery, you are dead.


Panic don't shutdown the entire program, only panics on the main goroutine. If your HTTP servers panics it will just kill the goroutine handling the request.


But that's good, right? As a rule, panics should terminate your program.


Maybe you should choose, or be aware of it. I have seen it happen on network code, that is by definition unreliable.....


Yeah, bad libraries exist :(


"threads" (goroutines) and channels are the new gotos. They tend to make a codebase more difficult to reason about. The more you add, the more you tend to get spooky action at a distance.


These are hardly new concepts, nor are there any more appropriate mechanisms for modeling dependencies between multiple concurrent actors. If anything, the traditional shared memory nightmare has proven to be the goto in the room.


Funny you should say actors. While some people will equate the two, I find Erlang actors easier to reason about than go channels. Though that has everything to do with monitors and links, to tie dependent goroutines/processes together.


And, like the goto statement, surely there will be scolds warning everyone not to use these anywhere instead of treating them like any other language feature to be used with care and as sparingly as possible.


Unfortunately Moore's law is no longer expressing itself in terms of single threaded performance. So the only way to take advantage of the faster CPUs that are being released is to write multi-threaded code.

Many alternatives to channels (mutexes, atomics, semaphores, condition variables) have just as much if not more chance of turning the codebase into a mess. But yes I agree that if performance-wise you can handle being single-threaded, then do that, because it's much simpler.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: