

Eliminating the Call Stack to Save RAM [pdf] - vmorgulis
http://www.cs.utah.edu/~regehr/papers/lctes062-yang.pdf

======
userbinator
Amongst those who use Asm regularly this is a fairly well-known technique...
in BIOS code pre-memory-initialisation, for example; there is absolutely no
RAM available at that point, so even a call stack can't be used, but you can
still reuse code by putting the "return address" in a register and then
jumping to the common block of code:

    
    
        mov sp, ret1
        jmp block
      ret1:
        ...
        mov sp, ret2
        jmp block
      ret2:
        ...
      block:
        ...
        jmp sp
    

You can even to go somewhere else after that block instead of continuing like
a function call would, just by modifying the destination. No recursion is
possible with this but I've seen BIOS code do more than one "level" of "calls"
by using more "return address" registers (it tends to be BP and SP.)

It's funny to see this technique being rediscovered/reinvented (AFAICS the
authors seem to think this "flattening" is an entirely new idea), and somewhat
poorly too - there's no need to use switch statements and their associated
complexity to map integers back into addresses, when all that's really
required is for the "caller" to supply a where-to-go-next address. This is
possible on any CPU that has an indirect-jump type of instruction. I've seen
this in Z80 and 6502 code, as well as x86; it's present in early PC
applications that were handwritten Asm.

[https://sites.google.com/site/pinczakko/pinczakko-s-guide-
to...](https://sites.google.com/site/pinczakko/pinczakko-s-guide-to-award-
bios-reverse-engineering#Call_Instruction_Peculiarity)

~~~
jerf
"there's no need to use switch statements and their associated complexity to
map integers back into addresses, when all that's really required is for the
"caller" to supply a where-to-go-next address."

There's nothing quite like implementing an abstraction, then turning right
around and unimplementing the abstraction as a layer on top. (See also: using
relational DBs as key/value stores. Bonus points if it implements a hierarchy
that looks like a file system! Implementing unreliable data delivery on top of
TCP (which can be done by reconnections). Taking a character/block device and
implementing block/character access. Implementing streaming on top of page-
based abstractions like HTTP, implemented on top of streaming via TCP; there's
"official" ways to do this but for a long time it qualified.) If you're
wondering where the CPU cycles are going....

~~~
Dylan16807
I don't understand what abstraction is being unimplemented with a layer on top
here. Whether you use a stack or avoid it with [multiple] link registers,
those are both reasonable methods of getting back and neither one builds on
top of the other or undoes any work that's already been done. Whether to use
tokens or addresses has tradeoffs, but I still don't see any work being
unimplemented.

At worst they avoided a stack and made an implementation that was unoptimized
in an entirely unrelated manner, not because it unimplemented anything on top.

------
dferlemann
Freeing up 20% RAM with a trade off of 14% increase in ROM usage is fine and
all... Code readability is kind of an issue here as well.

~~~
WallWextra
Just to be clear, this is a compiler optimization. Do you mean the readability
of the generated assembly code?

~~~
dferlemann
I was pointing to the practice of flattening code.

------
srean
Cant read pdf on my phone, so a quick question: Wouldn't cps style help ?

