Maybe an actual stack language like Forth could do so? The stack overhead of calling a word is a lot lower to begin with, and ColorForth for already has tail call optimisation[0]. More advanced concatenative languages like Factor[1] or Kitten[2] might be able to perform more interesting optimisations.
[0] http://wiki.c2.com/?TailCallOptimization
[1] https://factorcode.org/
[2] http://kittenlang.org/