This code has a significant and common flaw: If the token is cached, it invokes ...

charleslmunger · on Nov 5, 2018

Absolutely this. See https://google.github.io/guava/releases/23.0/api/docs/src-ht...

For a discussion on some of the dangers of invoking completion listeners synchronously.

anothergoogler · on Nov 4, 2018

You have it backwards, it's the caller's bug if they assume order in an async process.

SamReidHughes · on Nov 4, 2018

No no no no no. He's right, and this is a hard-learned lesson that many practitioners have discovered independently.

Both the caller and callee face huge problems of reentrant function calls when you mix up the behavior.

Generally the right choice is to make every callback invoked asynchronously, unless you need to do something else.

alexashka · on Nov 5, 2018

Can you explain what problems someone faces if a callback is executed immediately?

Matthias247 · on Nov 5, 2018

One usual issue on the callee side is that the code inside the callback might call again into the same component (here: AccessTokenLoader::retrieveToken). When this call is done, state inside the component is modified before the callback returns. When the callback returns the component is now no longer in the state that it was, which might not be expected. In C++, it could even happen that the object was deleted within the callback. Then accessing any local field after the callback returns leads to a segfault.

One kind of remedy is to only call callbacks at the end of the the logic, and don't touch anything anymore after the callback returns. That works in some cases - however scheduling the callback might be easier to reason about.

Then there are also issues on the caller side, if the caller doesn't expect that the callback is sometimes made synchronously and sometimes asynchronously. If it's assumed that the callback happens asynchronously and some state is modified in it, the code that runs after invoking the initial function might work on wrong assumptions (wasn't aware that state could have been modified).

SamReidHughes · on Nov 5, 2018

In these examples let's suppose function (or object) A passes the callback function C to the function (or object) B. Function B is what calls or indirectly causes to be called the callback C.

Scenario 1: Object B has a function body which looks like:

1. Look at or modify B's state.

2. Call callback C.

3. With code that assumes the state hasn't changed, do more stuff to B.

If C calls back into B (for example, if we'd like to perform the operation a second time, or if we'd like to perform a different operation), the assumption on line 3 might be incorrect. Or, some other assumption in the reentrant call might be incorrect.

(One common outcome is that the developers say, "Oh, there's a race condition," and they add a mutex. Later, they say, "Oh, there's a deadlock," and make it a recursive mutex. This is why recursive mutexes are a major code smell.)

Scenario 2: The developer realizes a safe callback has to be called after all other side effects are complete. So the developer writes this for their function body:

1. Look at or modify B's state.

2. Decide you're going to call C later.

3. Do more stuff to B.

4. Call callback C, literally the last thing you do in the function.

(The developer is an experienced C++ developer and knows to make sure object destructors for variables going out of scope happen before step 4.)

The potential problem here is on A's and C's side. What if they write:

A-1. Acquire mutex M, and look at or modify A's state.

A-2. Call into B, with callback C (which affects the state of A).

A-3. With code that assumes the state hasn't changed, do more stuff to A.

A-4. Release mutex M.

In callback C (in some helper function it invokes):

C-1. Acquire mutex M.

C-2. Do stuff assuming we have exclusive access to the state of A.

C-3. Release mutex M.

If M is non-recursive, you have a deadlock. If M is a recursive mutex (or nonexistant) the assumption on line A-3 is incorrect. Or maybe the assumption on line C-2 is incorrect.

Scenario 3: The developer is now super-paranoid about side effects of their callback. So they write all their callbacks like this:

    E-1: callAsynchronously(function() {
        C-1: Acquire mutex M.
        C-2, C-3: as before.
       }).

Well now you have no performance advantage relative to just having object B call asynchronously.

Oh, and one more.

Scenario 4: The developer's a freakin' genius. Writes out dead-perfect hypersecure code. Can run a 5-minute mile, and smells nice afterwards. Then enough synchronous callbacks happen recursively to produce a stack overflow.

----

The thing about these scenarios is that if you call the callback function synchronously every time, you're more likely to catch the bug. It'll show up in tests, or you'll just get a deadlock, or some debug-mode "DebugMutex" assertion failure.

Also, if F always invokes asynchronously, and G always invokes synchronously, you can't call F from G. And you have to add a wrapper in your callback when calling G from F.

If, for a particular function, you have inconsistent behavior, you'll have a whole bunch of undertested edge cases. And they might involve any function up the callstack -- If F1 calls F2 calls F3 calls F4, and once in a while F4 invokes synchronously, that could cause F3, F2, and F1 to have that same property (if they don't do further async operations in the callbacks they pass up).

alexashka · on Nov 5, 2018

Thanks for the detailed reply.

Does scheduling work via queues (channels they call they now right? :)) alleviate most of these issues, while sacrificing some efficiency?

The scenarios you've described are brittle by design - I imagine better designs are out there?

SamReidHughes · on Nov 5, 2018

I don't know what exactly a queue or channel is as you mean it, so it could depend, but the short answer is if you can throw in real mutexes and won't risk deadlock then that's a big mitigation.

A good way to improve all the scenarios I've mentioned is to return a future instead of calling a callback. Many of the problems basically go away, if you use futures for everything. There is still some danger of scenario 2 at the boundary between futures and other code. Some futures libraries in systems that have an event loop (like JavaScript, I've been told) always schedule futures' callbacks asynchronously partly for this reason. But others don't, partly because they're written not to depend on an event loop. A consequence is also better performance, but a risk of stack overflow.

solarexplorer · on Nov 4, 2018

Exactly. UITableView is a good example of how _not_ to do things.

dep_b · on Nov 4, 2018

Yeah the mixup of MVC layers is terrible and anything beyond a hard reload data has many edge cases.