Argh, I really wish we could implement the tail-call proposal in wasm2c (while preserving the straightforward transformation of Wasm functions -> C functions), but it seems like it's going to be one of the tougher proposals to handle.
LLVM has everything needed via the musttail marker, but the musttail attribute in clang is a lot more restrictive (it enforces that the caller and callee function signatures are the same).
It might theoretically be possible to get more flexibility plumbed through to clang, but then I'm guessing getting GCC on board would be a huge job too. :-/
Scheme implementations targeting C (e.g. Chicken[1] and Gambit[2]) have literal decades of experience doing tail calls, but admittedly the resulting translations (trampolining, Cheney on the MTA, whatever Gambit does) do not make for very natural C. The C paper[3] you mention does make note of this.
The limits of Clang’s musttail stem from the limitations of calling conventions (which, regardless of what the standard says, allow for passing more arguments than the callee expects, thus require caller cleanup). So far the solutions[4] to this seem rather awkward and definitely not backwards compatible. Are you aware of any convincing proposals in this direction?
I've been able to get around this in language-to-C compilation by inserting trampoline macros. The first instance of the call deploys the trampoline function, further tail calls return to it with the next function pointer as a parameter.
wasm3 uses tailcalls to implement its interpreter bytecode handlers and it manages to successfully force tail-call optimization in both gcc and llvm. Worth having a look on how it does that?
The problem with those more complex approaches isn't even the complexity itself - it's also that they're not as fast as a proper implementation could be.
> One observable consequence of tail calls is (besides a reduced risk of stack overflow) that tail callers do not appear in stack traces. Neither do they appear in the stack property of a caught exception, nor in the DevTools stack trace. By the time an exception is thrown, or execution pauses, the tail caller frames are gone and there is no way for V8 to recover them.
It seems like a compiler could unroll a function one step, so the first internal call isn't a tail call? Then the first stack frame with the initial arguments would remain on the stack, and the stack trace would be similar to what it would be if you wrote a loop instead.
> One observable consequence of tail calls is (besides a reduced risk of stack overflow) that tail callers do not appear in stack traces. Neither do they appear in the stack property of a caught exception, nor in the DevTools stack trace.
This is because stack traces are continuations; i.e. they show the future of our computation; those things which have not yet been done.
Reaching a function's tail position means, by definition, that its computations are finished. Hence why such functions don't appear in stack traces; the same way that, say, a call like 'foo(bar(42))' won't include 'bar' in a stack trace if 'foo' crashes.
> By the time an exception is thrown, or execution pauses, the tail caller frames are gone and there is no way for V8 to recover them.
This can be done by a time-travelling debugger; since that keeps a trace of past computations. It also adds overhead, though.
Sometimes it's not just a single function calling itself, but a whole bunch of functions calling into each other, which can get more confusing.
This is also tricky when you have indirect tail calls, because now you don't know if it's the first call to that particular function so you can't always unroll it anyway.
I'm thinking of something like an interpreter, where each function implements one instruction and it does tail calls to go to the next instruction, instead of having a giant switch statement in a loop.
To match the implementation without tail calls, you probably don't want the previous instruction(s) on the stack, but all entry points into the interpreter should probably be on the stack.
One thing that I wonder about is: could we write extra code that makes those reversible? I understand that there are no stack frames other than the topmost one, but maybe if we recorded the first and last one as well as a copy of the arguments, it could be enough to reconstruct the flow?
It really is. I often have to go and modify some (tail-call-optimised-away) code just to get it to turn up in a stack trace so that I can continue with what I’m debugging.
Just yesterday I had to reason about a stack trace involving tail calls. I worked it out. I wouldn't rather lose TCO so I could get stack traces that I don't have to reason about.
What isn't clear to me is why tail calls need to be implemented in WASM, rather than in the compiler? The post linked to Josh Habermans post on tail calls, which show how tail calls can help the compiler decide where to inline (cool!). But that was needed for the C++ text, not the LLVM code. It feels like tail calls are too high level of a concept to be in an "assembly" language.
If a function calls itself using tail-recursion, a compiler can turn that into a loop without too much trouble. However, if it's tail-calling a different function then that becomes more difficult; it would have to merge the two functions into a single WASM function. And if the tail-call is indirect (through a function pointer) then it is impossible to turn into a loop.
> It feels like tail calls are too high level of a concept to be in an "assembly" language.
WASM is a bit higher level than typical assembly languages. It doesn't have unrestricted "goto", so there's no way to implement tail calls optimization the "hard way".
> It feels like tail calls are too high level of a concept to be in an "assembly" language.
In x86, the difference is between `call some_fn` and `jmp some_fn` (perhaps after a `call _guard_check_icall` to implement Control Flow Guard)
In WASM, the difference is between `call[_indirect] some_fn` and `return_call[_indirect] some_fn`, which is just `jmp some_fn` with a funny name - and similar control flow integrity constraints as imposed by CFG. Why not just use `jmp some_fn`? Because WASM lacks a generic `jmp some_fn`, as they've baked https://webassembly.org/docs/security/#control-flow-integrit... into the arch, which seems fair enough.
To add to other reasons, my opinion is: Webassembly right now is an extremely easy target for compilation. You can create your own compiler absolutely easily and get plenty of V8 optimizations for free. It's like LLVM, but probably much more accessible for hobby projects. You just parse text, build AST and dump AST to WebAssembly.
If WebAssembly would implement tail calls, implementing them in an original language requires zero efforts, they would just work.
But implementing tail calls using some kind of optimization, like function rewrite to loop is far from easy.
So my personal hope is that Webassembly would include features that are possible but very hard to develop in a hobby project. GC is another example. Even primitive toy GC is a serious project.
Because the general tail call problem isn't self recursion - that is indeed trivial to flatten, but co-recursion, or recursion when you don't know the target.
co-recursion (f calls g calls h calls f, etc) requires you create a single function containing the bodies of all functions that will be called via tail recursion, put them all in a loop, and have a single stack frame capable of holding all the independent frames concurrently. It's doable, but taking my dumb example you'd need to do this for each of f, g, and h, or you have to convert those functions into wrapper functions for your one giant mega function. The mega function then has issues if it has non-tail recursive calls that re-enter as its own frame is large. The stack frame is large because WASM is structured: the bytecode can't just treat the stack as a general purpose blob of memory.
Dynamic recursion is a case where ahead of time you quite simply cannot have compile time flattening, e.g.
int f(int (*g)()) {
return g();
}
Languages that directly target machine code can do this because they just have to rebuild the stack frame at the point of the call and perform a jump rather than a call/bl instruction. WASM bytecode can't do that, as it needs to be safe: the stack is structured and unrestricted jumps aren't an option.
Now it's not uncommon for the various JS/WASM engines to perform tail call optimizations anyway, but the important things that many languages need is guaranteed tail calls, e.g. a way to say "hello runtime, please put the work into making this a tail call even if you haven't decided the function is hot enough to warrant optimizations, as I require TCO for correctness".
For example lets imagine this silly factorial function:
function factorial(n, accumulator = 1) {
if (n <= 1) return accumulator;
return factorial(n - 1, accumulator * n);
}
in the absence of guaranteed TCO, this might fail for a large enough argument n. If the function is called enough a JIT might choose to run some optimization passes that perform TCO and then this works for any value of n. So for correctness it is necessary to be able to guarantee TCO will occur. In wasm that's apparently [going to be?] a annotated call instruction, which is what .net's vm does as well. The JS TCO proposal a few years back simply said something like "if a return's expression is a function call, the call must be tail recursive".
WASM is not really an assembly language. Before this, WASM didn't have jump at all, and so tail calls are adding a form of jump (jump with arguments). This makes WASM a much better compilation target.
It doesn't and I hope whoever is in charge of web assembly doesn't ruin what they have by giving in to nonsense trends that are part of a silver bullet syndrome and don't offer real utility.
For example, it's done by all of the Schemes, all of the MLs (SML, Ocaml, Haskell, Rust, Scala, Idris, etc.), scripting languages (Lua, Elm, Perl, Tcl, etc.), and many others.
> that are part of a silver bullet syndrome and don't offer real utility
Their "real utility" is right there in the subtitle of that original paper:
Debunking the "Expensive Procedure Call" Myth
or, Procedure Call Implementations Considered Harmful
or, Lambda: The Ultimate GOTO
In other words, any implementation of procedure/function/method-calls which doesn't eliminate tail-calls is defective (slow, memory-hungry, stack-unsafe, etc.)
Not standard, since almost no programming is actually done using tail calls. Programming is done with loops, tail calls are an exotic and niche way of working.
Even in lua and rust tail calls are rarely used and the other languages you listed are extremely niche.
Debunking the "Expensive Procedure Call" Myth
or, Procedure Call Implementations Considered Harmful
or, Lambda: The Ultimate GOTO
These are not explanations of utility, these are titles that you are using to claim something without backing it up.
In other words, any implementation of procedure/function/method-calls which doesn't eliminate tail-calls is defective (slow, memory-hungry, stack-unsafe, etc.)
This is a very bold claim with no evidence for it and pretty much the entire history of programming against it.
Also, it was literally the title of a paper. If citing the literature with a hyperlink to wikisource doesn't count as "backing it up", then I have no idea where you put the goalposts.
> almost no programming is actually done using tail calls
Literally every function/procedure/method/subroutine/etc. has at least one tail position (branching allows more than one). It's pretty bold to claim that there are 'almost no' function calls in those positions. I wouldn't believe this claim without seeing some sort of statistics.
> Programming is done with loops, tail calls are an exotic and niche way of working.
Loops have limited expressiveness; e.g. they don't compose, they break encapsulation, etc. Hence most (all?) programs utilise some form of function/method/procedure/subroutine/GOTO. Tail-calls are simply a sub-set of the latter which, it turns out, are more powerful and expressive than loops (as an obvious example: machine-code doesn't have loops, since it's enough to have GOTO (AKA tail-calls)).
That wasn't 'me claiming something', it was Guy L Steele Jr.
You still just copy and pasted titles, this isn't evidence of anything.
If citing the literature
Then put in the part you think is evidence or significant. This is the classic "prove my point for me" routine. You're the one who wants to change a standard.
Literally every function/procedure/method/subroutine/etc. has at least one tail position
Are you conflating general functions with tail call elimination?
Loops have limited expressiveness; e.g. they don't compose, they break encapsulation,
Why would that be true? How would looping through recursion change this?
Hence most (all?) programs utilise some form of function/method/procedure/subroutine/GOTO
What does this have to do with tail call optimizations? Web assembly has functions.
machine-code doesn't have loops,
Web assembly is not the same as machine code
I think overall you are thinking that making claims is the same as evidence. You haven't explained any core idea why tail call optimizations have any benefit in programming or web assembly. You basically just said a well known language creator put them in some languages. There is no explanation of what problem is being solved.
Indeed; if web assembly allowed unrestrained GOTOs (like machine code) then compilers would already be able to do tail-call elimination.
---
> Are you conflating general functions with tail call elimination?
> What does this have to do with tail call optimizations?
> You haven't explained any core idea why tail call optimizations have any benefit
Sorry, I think there have been some crossed wires: I was mostly pointing out the absurdity of your statement that "almost no programming is actually done using tail calls" (when in fact, almost all programs will contain many tail-calls).
That's separate to the question of how tail-calls should be implemented: in particular, whether they should perform a GOTO (AKA tail-call elimination/optimisation); or, alternatively, whether they should allocate a stack frame, store a return address, then perform a GOTO, then later perform another GOTO to jump back, then deallocate that stack frame, etc.
> You haven't explained any core idea why tail call optimizations have any benefit in programming or web assembly
Based on my previous sentence, I would turn the question around: what benefit is gained from allocating unnecessary stack frames (which waste memory and cause stack overflows), performing redundant jumps (slowing down programs), etc.?
almost no programming is actually done using tail calls
I'm talking about tail call optimization, which is what this whole thread was about, what you are you talking about?
Based on my previous sentence, I would turn the question around:
That's not how it works, since you're the one wanting a standard to change.
what benefit is gained from allocating unnecessary stack frames
Where are you getting the idea the webasm JIT has to allocated 'unnecessary stack frames'.
stack frames (which waste memory and cause stack overflows)
This makes me think you don't understand how the stack even works. The memory is already there and stack memory is very small. You can't both 'waste memory' and overflow the stack at the same time. This stuff is fundamental to how computers work.
It seems like you don't even know or understand what problem you are trying to solve. Where did you get this absurd idea that the stack is a problem? Also do you think that rust doesn't use a stack?
It would help any recursively-defined code, parsers of some input format are often more naturally represented that way, for instance, where rewriting as an imperative loop requires mixing of different logical states, eg, a JS JSON parser might determine if the current character is for an object, array, or leaf type, and then pass the input along to whichever that is, and if it was an array, for instance, would then pass back to that initial function one character over to determine what the first type of the first element of the array is. This is easy in a recursive style and would require all of this logic to be mixed in the same function with some state tracking variable to decide which branch to use in imperative style.
You can use trampolines[1] to get around this, but there's a performance penalty versus TCO and other recursive optimizations.
Honestly, the impression I get is that it seemed like a cool thing to have, rather than something that came out of genuine need. I very rarely see recursive functions in the wild, and while I quite like them and probably use them more often than average, I rarely write tail-recursive functions.
Specifically, Javascript has pretty good iteration tools, so the basic recursion-as-iteration stuff isn't necessary, and usually pretty inefficient. Which means recursion tends to only get used when you're dealing with a recursive data structure of some description, or doing something naturally recursive. But in my experience these cases of natural recursion tend not to lend themselves to tail recursion - in the general case, you need a stack argument somewhere to allow the tail recursion to work, at which point you might as well just write a non-recursive, loop-based version.
The result of this is that truly tail-recursive functions are fairly rare in "idiomatic" Javascript code, which means implementation effort spent there is going to affect relatively little code. I could imagine that transpiled code might use this technique more often, but then the rise of WASM probably means that fewer languages are going to see Javascript as their primary compilation target, and instead choose WASM.
Natural recursion that stems from data structures will be tail-recursive if they themselves are recursive in this manner. Thus e.g. walking a linked list is a problem that naturally translates to tail recursion, but walking a binary tree isn't.
But also, part of the reason why you don't see much deeply recursive code in the wild is because there's no guarantee that it'll be tail-call-optimized, which means that it'll likely fail on large inputs, and will be written iteratively to begin with.
Even many operations performed on a linked list will not be tail recursive if you do them naively. For example, a naive sum function might look like:
const sum = (node: Node): number => {
if (!node.tail) return node.value;
return node.value + sum(node.tail);
}
Even if tail call optimisation were a thing, I'd still expect most recursive functions to be written in a non-tail-recursive fashion, and only later optimised to be tail recursive. But in practice, it's already possible to rewrite those functions in an imperative way when they do become a performance issue.
So in practice, I can see why implementors don't prioritise TCO in Javascript, coming from both directions: there's not a lot of free wins available, because tail calls are rarely the most natural way to express a recursive algorithm; and even if you want to explicitly optimise for them, there's already plenty of ways to optimise an inefficient recursive algorithm if that's necessary.
That said, I can see the value a lot more in a language that's being used as a compilation target, which it probably made more sense when there was more buzz around compile-to-JS languages, and why it makes more sense now with WebAssembly.
Very exciting to see this continuing to progress and hats off to those who kept pushing on the proposal to get it here.
I've been building a little compiler from a toy dialect of Lisp to WASM as a side project but let it fall by the wayside. I should pick it back up now that tail calls are usable in several runtimes.
The C committee voted 8-7-8 against including mandatory tail calls in C23 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2941.pdf), mostly I think because of a lack of implementation experience.
LLVM has everything needed via the musttail marker, but the musttail attribute in clang is a lot more restrictive (it enforces that the caller and callee function signatures are the same).
It might theoretically be possible to get more flexibility plumbed through to clang, but then I'm guessing getting GCC on board would be a huge job too. :-/