Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This code has a significant and common flaw: If the token is cached, it invokes the handler synchronously.

You should never design a function with a completion handler that sometimes invokes it synchronously and sometimes asynchronously. That can cause all sorts of problems, the simplest of which is a caller that expects it to be asynchronous and ends up performing work out-of-order because the handler was invoked synchronously.

The final form of the code in this article accidentally fixes this, by moving the work it does onto a background queue, but this just introduces a separate flaw: it performs the callback from its internal private queue. This isn't necessarily a problem, except in that it will block any other work from running on that queue unnecessarily (as it's a serial queue), but in more complicated code, having external code running on your private queue can indeed cause problems.

---

There's also another issue, which is that if the AccessTokenService gets deinited before the loader calls its callback, the callback is dropped on the floor. In general, you should not design a function that takes a completion callback with conditions where the completion callback is never fired. There's 3 reasonable behaviors that could be implemented here:

1. Have the callback keep the AccessTokenService alive. It's pretty common for networking operations to keep themselves alive while executing, so this is probably the approach I'd take.

2. Have the AccessTokenService fire all of its pending completion handlers with an error in deinit. The error should ideally represent the notion that the operation was cancelled.

3. Change the AccessTokenLoader completion handler to look more like the initial version, where it still calls the callback even if self is nil. With the pending completion handlers approach, this requires moving the pending completion handlers into a reference-type wrapper and strongly referencing that from the AccessTokenLoader callback. In this case, that's pretty much the same as approach #1, so you should just go ahead and use approach #1, but in a more complex class this approach might have some utility.




Absolutely this. See https://google.github.io/guava/releases/23.0/api/docs/src-ht...

For a discussion on some of the dangers of invoking completion listeners synchronously.


You have it backwards, it's the caller's bug if they assume order in an async process.


No no no no no. He's right, and this is a hard-learned lesson that many practitioners have discovered independently.

Both the caller and callee face huge problems of reentrant function calls when you mix up the behavior.

Generally the right choice is to make every callback invoked asynchronously, unless you need to do something else.


Can you explain what problems someone faces if a callback is executed immediately?


One usual issue on the callee side is that the code inside the callback might call again into the same component (here: AccessTokenLoader::retrieveToken). When this call is done, state inside the component is modified before the callback returns. When the callback returns the component is now no longer in the state that it was, which might not be expected. In C++, it could even happen that the object was deleted within the callback. Then accessing any local field after the callback returns leads to a segfault.

One kind of remedy is to only call callbacks at the end of the the logic, and don't touch anything anymore after the callback returns. That works in some cases - however scheduling the callback might be easier to reason about.

Then there are also issues on the caller side, if the caller doesn't expect that the callback is sometimes made synchronously and sometimes asynchronously. If it's assumed that the callback happens asynchronously and some state is modified in it, the code that runs after invoking the initial function might work on wrong assumptions (wasn't aware that state could have been modified).


In these examples let's suppose function (or object) A passes the callback function C to the function (or object) B. Function B is what calls or indirectly causes to be called the callback C.

Scenario 1: Object B has a function body which looks like:

1. Look at or modify B's state.

2. Call callback C.

3. With code that assumes the state hasn't changed, do more stuff to B.

If C calls back into B (for example, if we'd like to perform the operation a second time, or if we'd like to perform a different operation), the assumption on line 3 might be incorrect. Or, some other assumption in the reentrant call might be incorrect.

(One common outcome is that the developers say, "Oh, there's a race condition," and they add a mutex. Later, they say, "Oh, there's a deadlock," and make it a recursive mutex. This is why recursive mutexes are a major code smell.)

Scenario 2: The developer realizes a safe callback has to be called after all other side effects are complete. So the developer writes this for their function body:

1. Look at or modify B's state.

2. Decide you're going to call C later.

3. Do more stuff to B.

4. Call callback C, literally the last thing you do in the function.

(The developer is an experienced C++ developer and knows to make sure object destructors for variables going out of scope happen before step 4.)

The potential problem here is on A's and C's side. What if they write:

A-1. Acquire mutex M, and look at or modify A's state.

A-2. Call into B, with callback C (which affects the state of A).

A-3. With code that assumes the state hasn't changed, do more stuff to A.

A-4. Release mutex M.

In callback C (in some helper function it invokes):

C-1. Acquire mutex M.

C-2. Do stuff assuming we have exclusive access to the state of A.

C-3. Release mutex M.

If M is non-recursive, you have a deadlock. If M is a recursive mutex (or nonexistant) the assumption on line A-3 is incorrect. Or maybe the assumption on line C-2 is incorrect.

Scenario 3: The developer is now super-paranoid about side effects of their callback. So they write all their callbacks like this:

    E-1: callAsynchronously(function() {
        C-1: Acquire mutex M.
        C-2, C-3: as before.
       }).
Well now you have no performance advantage relative to just having object B call asynchronously.

Oh, and one more.

Scenario 4: The developer's a freakin' genius. Writes out dead-perfect hypersecure code. Can run a 5-minute mile, and smells nice afterwards. Then enough synchronous callbacks happen recursively to produce a stack overflow.

----

The thing about these scenarios is that if you call the callback function synchronously every time, you're more likely to catch the bug. It'll show up in tests, or you'll just get a deadlock, or some debug-mode "DebugMutex" assertion failure.

Also, if F always invokes asynchronously, and G always invokes synchronously, you can't call F from G. And you have to add a wrapper in your callback when calling G from F.

If, for a particular function, you have inconsistent behavior, you'll have a whole bunch of undertested edge cases. And they might involve any function up the callstack -- If F1 calls F2 calls F3 calls F4, and once in a while F4 invokes synchronously, that could cause F3, F2, and F1 to have that same property (if they don't do further async operations in the callbacks they pass up).


Thanks for the detailed reply.

Does scheduling work via queues (channels they call they now right? :)) alleviate most of these issues, while sacrificing some efficiency?

The scenarios you've described are brittle by design - I imagine better designs are out there?


I don't know what exactly a queue or channel is as you mean it, so it could depend, but the short answer is if you can throw in real mutexes and won't risk deadlock then that's a big mitigation.

A good way to improve all the scenarios I've mentioned is to return a future instead of calling a callback. Many of the problems basically go away, if you use futures for everything. There is still some danger of scenario 2 at the boundary between futures and other code. Some futures libraries in systems that have an event loop (like JavaScript, I've been told) always schedule futures' callbacks asynchronously partly for this reason. But others don't, partly because they're written not to depend on an event loop. A consequence is also better performance, but a risk of stack overflow.


Exactly. UITableView is a good example of how _not_ to do things.


Yeah the mixup of MVC layers is terrible and anything beyond a hard reload data has many edge cases.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: