Hacker News new | past | comments | ask | show | jobs | submit login
WebAssembly Tail Calls (v8.dev)
135 points by feross on April 6, 2023 | hide | past | favorite | 56 comments



Argh, I really wish we could implement the tail-call proposal in wasm2c (while preserving the straightforward transformation of Wasm functions -> C functions), but it seems like it's going to be one of the tougher proposals to handle.

The C committee voted 8-7-8 against including mandatory tail calls in C23 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2941.pdf), mostly I think because of a lack of implementation experience.

LLVM has everything needed via the musttail marker, but the musttail attribute in clang is a lot more restrictive (it enforces that the caller and callee function signatures are the same).

It might theoretically be possible to get more flexibility plumbed through to clang, but then I'm guessing getting GCC on board would be a huge job too. :-/


Scheme implementations targeting C (e.g. Chicken[1] and Gambit[2]) have literal decades of experience doing tail calls, but admittedly the resulting translations (trampolining, Cheney on the MTA, whatever Gambit does) do not make for very natural C. The C paper[3] you mention does make note of this.

The limits of Clang’s musttail stem from the limitations of calling conventions (which, regardless of what the standard says, allow for passing more arguments than the callee expects, thus require caller cleanup). So far the solutions[4] to this seem rather awkward and definitely not backwards compatible. Are you aware of any convincing proposals in this direction?

[1] https://www.call-cc.org/

[2] https://gambitscheme.org/

[3] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf

[4] https://www.complang.tuwien.ac.at/schani/diplarb.ps


I've been able to get around this in language-to-C compilation by inserting trampoline macros. The first instance of the call deploys the trampoline function, further tail calls return to it with the next function pointer as a parameter.


wasm3 uses tailcalls to implement its interpreter bytecode handlers and it manages to successfully force tail-call optimization in both gcc and llvm. Worth having a look on how it does that?


It's possible, but for GCC requires optimizations to be enabled. MSVC is completely off-limits in that approach.


Sounds like you can do it, just complex.

don't let complexity stop you!


The problem with those more complex approaches isn't even the complexity itself - it's also that they're not as fast as a proper implementation could be.


> One observable consequence of tail calls is (besides a reduced risk of stack overflow) that tail callers do not appear in stack traces. Neither do they appear in the stack property of a caught exception, nor in the DevTools stack trace. By the time an exception is thrown, or execution pauses, the tail caller frames are gone and there is no way for V8 to recover them.

It seems like a compiler could unroll a function one step, so the first internal call isn't a tail call? Then the first stack frame with the initial arguments would remain on the stack, and the stack trace would be similar to what it would be if you wrote a loop instead.


> One observable consequence of tail calls is (besides a reduced risk of stack overflow) that tail callers do not appear in stack traces. Neither do they appear in the stack property of a caught exception, nor in the DevTools stack trace.

This is because stack traces are continuations; i.e. they show the future of our computation; those things which have not yet been done.

Reaching a function's tail position means, by definition, that its computations are finished. Hence why such functions don't appear in stack traces; the same way that, say, a call like 'foo(bar(42))' won't include 'bar' in a stack trace if 'foo' crashes.

> By the time an exception is thrown, or execution pauses, the tail caller frames are gone and there is no way for V8 to recover them.

This can be done by a time-travelling debugger; since that keeps a trace of past computations. It also adds overhead, though.


Everyone: We want mandatory tail call elimination.

Also everyone: We want complete stack traces.

Imagine wanting O(call depth) stack frames after demanding that the stack can only use O(1) memory.


Sometimes it's not just a single function calling itself, but a whole bunch of functions calling into each other, which can get more confusing.

This is also tricky when you have indirect tail calls, because now you don't know if it's the first call to that particular function so you can't always unroll it anyway.


I'm thinking of something like an interpreter, where each function implements one instruction and it does tail calls to go to the next instruction, instead of having a giant switch statement in a loop.

To match the implementation without tail calls, you probably don't want the previous instruction(s) on the stack, but all entry points into the interpreter should probably be on the stack.


You can get this without the compiler doing any unrolling, by doing non-tail calls at the points where you enter the interpreter.


One thing that I wonder about is: could we write extra code that makes those reversible? I understand that there are no stack frames other than the topmost one, but maybe if we recorded the first and last one as well as a copy of the arguments, it could be enough to reconstruct the flow?


This is really not a problem.


It really is. I often have to go and modify some (tail-call-optimised-away) code just to get it to turn up in a stack trace so that I can continue with what I’m debugging.


Just yesterday I had to reason about a stack trace involving tail calls. I worked it out. I wouldn't rather lose TCO so I could get stack traces that I don't have to reason about.


It's no worse than a do-while loop.


Why debug optimized code?


Because that's the code that throws the exceptions after the product has left the lab.


What isn't clear to me is why tail calls need to be implemented in WASM, rather than in the compiler? The post linked to Josh Habermans post on tail calls, which show how tail calls can help the compiler decide where to inline (cool!). But that was needed for the C++ text, not the LLVM code. It feels like tail calls are too high level of a concept to be in an "assembly" language.


If a function calls itself using tail-recursion, a compiler can turn that into a loop without too much trouble. However, if it's tail-calling a different function then that becomes more difficult; it would have to merge the two functions into a single WASM function. And if the tail-call is indirect (through a function pointer) then it is impossible to turn into a loop.

> It feels like tail calls are too high level of a concept to be in an "assembly" language.

WASM is a bit higher level than typical assembly languages. It doesn't have unrestricted "goto", so there's no way to implement tail calls optimization the "hard way".


> It feels like tail calls are too high level of a concept to be in an "assembly" language.

In x86, the difference is between `call some_fn` and `jmp some_fn` (perhaps after a `call _guard_check_icall` to implement Control Flow Guard)

In WASM, the difference is between `call[_indirect] some_fn` and `return_call[_indirect] some_fn`, which is just `jmp some_fn` with a funny name - and similar control flow integrity constraints as imposed by CFG. Why not just use `jmp some_fn`? Because WASM lacks a generic `jmp some_fn`, as they've baked https://webassembly.org/docs/security/#control-flow-integrit... into the arch, which seems fair enough.


To add to other reasons, my opinion is: Webassembly right now is an extremely easy target for compilation. You can create your own compiler absolutely easily and get plenty of V8 optimizations for free. It's like LLVM, but probably much more accessible for hobby projects. You just parse text, build AST and dump AST to WebAssembly.

If WebAssembly would implement tail calls, implementing them in an original language requires zero efforts, they would just work.

But implementing tail calls using some kind of optimization, like function rewrite to loop is far from easy.

So my personal hope is that Webassembly would include features that are possible but very hard to develop in a hobby project. GC is another example. Even primitive toy GC is a serious project.


Reference counting is a toy GC algorithm, quite simple to implement.

Making it perform as fast as tracing GC algorithms, now that is a serious project.


WASM has structured control flow, with stack frame maintained by runtime. You cannot just jump anywhere.


Because the general tail call problem isn't self recursion - that is indeed trivial to flatten, but co-recursion, or recursion when you don't know the target.

co-recursion (f calls g calls h calls f, etc) requires you create a single function containing the bodies of all functions that will be called via tail recursion, put them all in a loop, and have a single stack frame capable of holding all the independent frames concurrently. It's doable, but taking my dumb example you'd need to do this for each of f, g, and h, or you have to convert those functions into wrapper functions for your one giant mega function. The mega function then has issues if it has non-tail recursive calls that re-enter as its own frame is large. The stack frame is large because WASM is structured: the bytecode can't just treat the stack as a general purpose blob of memory.

Dynamic recursion is a case where ahead of time you quite simply cannot have compile time flattening, e.g.

    int f(int (*g)()) {
      return g();
    }
Languages that directly target machine code can do this because they just have to rebuild the stack frame at the point of the call and perform a jump rather than a call/bl instruction. WASM bytecode can't do that, as it needs to be safe: the stack is structured and unrestricted jumps aren't an option.

Now it's not uncommon for the various JS/WASM engines to perform tail call optimizations anyway, but the important things that many languages need is guaranteed tail calls, e.g. a way to say "hello runtime, please put the work into making this a tail call even if you haven't decided the function is hot enough to warrant optimizations, as I require TCO for correctness".

For example lets imagine this silly factorial function:

    function factorial(n, accumulator = 1) {
      if (n <= 1) return accumulator;
      return factorial(n - 1, accumulator * n);
    }
in the absence of guaranteed TCO, this might fail for a large enough argument n. If the function is called enough a JIT might choose to run some optimization passes that perform TCO and then this works for any value of n. So for correctness it is necessary to be able to guarantee TCO will occur. In wasm that's apparently [going to be?] a annotated call instruction, which is what .net's vm does as well. The JS TCO proposal a few years back simply said something like "if a return's expression is a function call, the call must be tail recursive".


WASM is not really an assembly language. Before this, WASM didn't have jump at all, and so tail calls are adding a form of jump (jump with arguments). This makes WASM a much better compilation target.


It doesn't and I hope whoever is in charge of web assembly doesn't ruin what they have by giving in to nonsense trends that are part of a silver bullet syndrome and don't offer real utility.


> giving in to nonsense trends

Tail-calls have been pretty standard since at least the 'Lambda the Ultimate GOTO' paper from 1977 https://en.wikisource.org/wiki/Lambda_Papers

For example, it's done by all of the Schemes, all of the MLs (SML, Ocaml, Haskell, Rust, Scala, Idris, etc.), scripting languages (Lua, Elm, Perl, Tcl, etc.), and many others.

> that are part of a silver bullet syndrome and don't offer real utility

Their "real utility" is right there in the subtitle of that original paper:

  Debunking the "Expensive Procedure Call" Myth
  or, Procedure Call Implementations Considered Harmful
  or, Lambda: The Ultimate GOTO
In other words, any implementation of procedure/function/method-calls which doesn't eliminate tail-calls is defective (slow, memory-hungry, stack-unsafe, etc.)


Nitpick: Rust doesn't do tail call optimization (unless llvm decides to unprompted, iirc).

https://github.com/rust-lang/rfcs/issues/2691


Tail-calls have been pretty standard

Not standard, since almost no programming is actually done using tail calls. Programming is done with loops, tail calls are an exotic and niche way of working.

Even in lua and rust tail calls are rarely used and the other languages you listed are extremely niche.

Debunking the "Expensive Procedure Call" Myth or, Procedure Call Implementations Considered Harmful or, Lambda: The Ultimate GOTO

These are not explanations of utility, these are titles that you are using to claim something without backing it up.

In other words, any implementation of procedure/function/method-calls which doesn't eliminate tail-calls is defective (slow, memory-hungry, stack-unsafe, etc.)

This is a very bold claim with no evidence for it and pretty much the entire history of programming against it.


> These are not explanations of utility, these are titles that you are using to claim something without backing it up.

That wasn't 'me claiming something', it was Guy L Steele Jr., who's worked on such "niche" languages as Java, Fortran and ECMAScript: https://en.wikipedia.org/wiki/Guy_L._Steele_Jr.

Also, it was literally the title of a paper. If citing the literature with a hyperlink to wikisource doesn't count as "backing it up", then I have no idea where you put the goalposts.

> almost no programming is actually done using tail calls

Literally every function/procedure/method/subroutine/etc. has at least one tail position (branching allows more than one). It's pretty bold to claim that there are 'almost no' function calls in those positions. I wouldn't believe this claim without seeing some sort of statistics.

> Programming is done with loops, tail calls are an exotic and niche way of working.

Loops have limited expressiveness; e.g. they don't compose, they break encapsulation, etc. Hence most (all?) programs utilise some form of function/method/procedure/subroutine/GOTO. Tail-calls are simply a sub-set of the latter which, it turns out, are more powerful and expressive than loops (as an obvious example: machine-code doesn't have loops, since it's enough to have GOTO (AKA tail-calls)).


That wasn't 'me claiming something', it was Guy L Steele Jr.

You still just copy and pasted titles, this isn't evidence of anything.

If citing the literature

Then put in the part you think is evidence or significant. This is the classic "prove my point for me" routine. You're the one who wants to change a standard.

Literally every function/procedure/method/subroutine/etc. has at least one tail position

Are you conflating general functions with tail call elimination?

Loops have limited expressiveness; e.g. they don't compose, they break encapsulation,

Why would that be true? How would looping through recursion change this?

Hence most (all?) programs utilise some form of function/method/procedure/subroutine/GOTO

What does this have to do with tail call optimizations? Web assembly has functions.

machine-code doesn't have loops,

Web assembly is not the same as machine code

I think overall you are thinking that making claims is the same as evidence. You haven't explained any core idea why tail call optimizations have any benefit in programming or web assembly. You basically just said a well known language creator put them in some languages. There is no explanation of what problem is being solved.


> Web assembly is not the same as machine code

Indeed; if web assembly allowed unrestrained GOTOs (like machine code) then compilers would already be able to do tail-call elimination.

---

> Are you conflating general functions with tail call elimination?

> What does this have to do with tail call optimizations?

> You haven't explained any core idea why tail call optimizations have any benefit

Sorry, I think there have been some crossed wires: I was mostly pointing out the absurdity of your statement that "almost no programming is actually done using tail calls" (when in fact, almost all programs will contain many tail-calls).

That's separate to the question of how tail-calls should be implemented: in particular, whether they should perform a GOTO (AKA tail-call elimination/optimisation); or, alternatively, whether they should allocate a stack frame, store a return address, then perform a GOTO, then later perform another GOTO to jump back, then deallocate that stack frame, etc.

> You haven't explained any core idea why tail call optimizations have any benefit in programming or web assembly

Based on my previous sentence, I would turn the question around: what benefit is gained from allocating unnecessary stack frames (which waste memory and cause stack overflows), performing redundant jumps (slowing down programs), etc.?


almost no programming is actually done using tail calls

I'm talking about tail call optimization, which is what this whole thread was about, what you are you talking about?

Based on my previous sentence, I would turn the question around:

That's not how it works, since you're the one wanting a standard to change.

what benefit is gained from allocating unnecessary stack frames

Where are you getting the idea the webasm JIT has to allocated 'unnecessary stack frames'.

stack frames (which waste memory and cause stack overflows)

This makes me think you don't understand how the stack even works. The memory is already there and stack memory is very small. You can't both 'waste memory' and overflow the stack at the same time. This stuff is fundamental to how computers work.

It seems like you don't even know or understand what problem you are trying to solve. Where did you get this absurd idea that the stack is a problem? Also do you think that rust doesn't use a stack?


nonsense trends like an "assembly" language supporting jumps?


I will quote you from a different comment:

WASM is not really an assembly language.


One of the main reasons it's not really an assembly language is that it doesn't support jumps! This is fixing a defect!


This is fixing a defect!

Only according to people using niche languages that are already slower than javascript but not anyone designing, implementing and using webasm.


This would allow you to do things like compile other assembly languages to webassembly.


What are you basing that on? If that were true then binaries for different CPU ISAs would also be able to be statically compiled to each other.


Even emscripten, which compiles LLVM to WASM has to use a relooper algorithm to get around the lack of jumps.


Amazing and sad that V8 is going to add tail call elimination in WASM before JavaScript.


how did safari implement tail call but not chrome? its the one thing that discourage me from continuing reading the new sicp with JS


does some JS idiom rely on TCO or is this for transpiled languages?


It would help any recursively-defined code, parsers of some input format are often more naturally represented that way, for instance, where rewriting as an imperative loop requires mixing of different logical states, eg, a JS JSON parser might determine if the current character is for an object, array, or leaf type, and then pass the input along to whichever that is, and if it was an array, for instance, would then pass back to that initial function one character over to determine what the first type of the first element of the array is. This is easy in a recursive style and would require all of this logic to be mixed in the same function with some state tracking variable to decide which branch to use in imperative style.

You can use trampolines[1] to get around this, but there's a performance penalty versus TCO and other recursive optimizations.

[1]: https://blog.logrocket.com/using-trampolines-to-manage-large...


Honestly, the impression I get is that it seemed like a cool thing to have, rather than something that came out of genuine need. I very rarely see recursive functions in the wild, and while I quite like them and probably use them more often than average, I rarely write tail-recursive functions.

Specifically, Javascript has pretty good iteration tools, so the basic recursion-as-iteration stuff isn't necessary, and usually pretty inefficient. Which means recursion tends to only get used when you're dealing with a recursive data structure of some description, or doing something naturally recursive. But in my experience these cases of natural recursion tend not to lend themselves to tail recursion - in the general case, you need a stack argument somewhere to allow the tail recursion to work, at which point you might as well just write a non-recursive, loop-based version.

The result of this is that truly tail-recursive functions are fairly rare in "idiomatic" Javascript code, which means implementation effort spent there is going to affect relatively little code. I could imagine that transpiled code might use this technique more often, but then the rise of WASM probably means that fewer languages are going to see Javascript as their primary compilation target, and instead choose WASM.


Natural recursion that stems from data structures will be tail-recursive if they themselves are recursive in this manner. Thus e.g. walking a linked list is a problem that naturally translates to tail recursion, but walking a binary tree isn't.

But also, part of the reason why you don't see much deeply recursive code in the wild is because there's no guarantee that it'll be tail-call-optimized, which means that it'll likely fail on large inputs, and will be written iteratively to begin with.


Even many operations performed on a linked list will not be tail recursive if you do them naively. For example, a naive sum function might look like:

    const sum = (node: Node): number => {
        if (!node.tail) return node.value;
        return node.value + sum(node.tail);
    }
Even if tail call optimisation were a thing, I'd still expect most recursive functions to be written in a non-tail-recursive fashion, and only later optimised to be tail recursive. But in practice, it's already possible to rewrite those functions in an imperative way when they do become a performance issue.

So in practice, I can see why implementors don't prioritise TCO in Javascript, coming from both directions: there's not a lot of free wins available, because tail calls are rarely the most natural way to express a recursive algorithm; and even if you want to explicitly optimise for them, there's already plenty of ways to optimise an inefficient recursive algorithm if that's necessary.

That said, I can see the value a lot more in a language that's being used as a compilation target, which it probably made more sense when there was more buzz around compile-to-JS languages, and why it makes more sense now with WebAssembly.


amazing I agree, sad - no, not really, why?


Because it's been part of ECMAScript since ES 6 but still not implemented. The only browser supporting it is Safari, I think.


Very exciting to see this continuing to progress and hats off to those who kept pushing on the proposal to get it here.

I've been building a little compiler from a toy dialect of Lisp to WASM as a side project but let it fall by the wayside. I should pick it back up now that tail calls are usable in several runtimes.


We (Leaning Technologies) are proud to have been a big contributing factor to the standardization of WebAssembly tail calls.

https://news.ycombinator.com/item?id=32069418



If they focused on finally getting WebIDL instead of perpetually postponing direct DOM access wasm would already be much further




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: