Edit: Seeing it written like this now, it's clear you could save yet another jump by defining a macro for what's in the while loop and putting it at the end of every function call.
That's an additional pointer-chase per loop. And more function prefix / suffix work. In actuality, I suspect that it'd be optimized out - but you cannot say "you cannot do X because it relies on compiler optimizations" and replace that with Y that relies on compiler optimizations.
A function call is nothing more than putting your return address and parameters on the stack and jumping to the address of the function. By referring directly to the function's address, there's no "additional pointer chase", since calling the function already does exactly that.
If you were to inline all of the perform functions in the GOTO version vs putting a macro at the end of each function, you're right that there's some function overhead, but I think it's as small as a single instruction to put the return address on the stack. Maybe that would be optimized away, maybe not.
To your point: My argument isn't that you can't do the GOTO version. With optimizations, it's essentially identical. My point is, that is a lot more hard-to-grok code to maintain for something that can be achieved in a simpler way.
You're forgetting the overhead of popping / pushing registers. Both in terms of direct instruction overhead, and indirectly through code size and working set bloat. Which, especially for smaller functions, can be significant. It's one of the problems with a register-oriented architecture.
Sometimes this can be optimized away, but not always.
In your example:
Edit: Seeing it written like this now, it's clear you could save yet another jump by defining a macro for what's in the while loop and putting it at the end of every function call.