

Can a local variable's memory be accessed outside its scope? - danso
http://stackoverflow.com/questions/6441218/can-a-local-variables-memory-be-accessed-outside-its-scope/6445794#6445794

======
tmoertel
The original question asked about C++, but in the old days of assembly
language, accessing "out of scope" data from the stack was fairly useful.

For example, Apple II computers had a series of slots into which expansion
cards could be inserted to add new features to the computer. These cards, as
you might expect, often needed controller software, and this was supplied via
on-card ROM. The code in this ROM, by convention, was mapped into memory at
address $Cx00, where x was the slot number the user happened to choose for the
card. As a result, when this code was called, it had no idea where it resided
in memory and, consequently, which slot its card was in. To figure these
things out, the controller code used the following trick:

    
    
        JSR IORTS
        TSX
        LDA $100,X
    

The JSR instruction makes a subroutine call, which causes the address of the
next instruction to be pushed onto the system stack and then transfers control
to the subroutine, in this case a well-known subroutine in system ROM called
IORTS. This routine, as its name implies, is just an RTS instruction, so it
returns immediately back to its caller. Control thus returns to the second
instruction of our code, which copies the stack pointer into the X register.
The third instruction then uses this pointer to read _above_ the top of the
system stack to obtain the value that was there during the call to IORTS. This
value, of course, was the most-significant byte of the return address pushed
onto the stack by the JSR instruction. Now the code knows where it resides.

If you want to see a prime example of this technique in some famous code hand-
assembled by Steve Wozniak 35 years ago, take a look at page 22 of the
following PDF file. It contains the boot code for the Apple Disk II controller
card:

[https://s3.amazonaws.com/s3data.computerhistory.org/atchm/do...](https://s3.amazonaws.com/s3data.computerhistory.org/atchm/documents/102723983-05-01-acc.pdf)

~~~
makomk
From what I recall, that's not necessarily safe on modern OSes and processors
due to interrupts, but there's still buggy code out there that relies on it.

~~~
monocasa
I can't think of a modern OS/ISA combination where that's the case. State will
get saved on a kernel mode stack (if not somewhere else totally). This is
because the stack pointer is controlled by user mode, and you don't want it to
do something like set the stack pointer to some area in kernel space, and then
invoke a software interrupt in order to overwrite privileged code or data.

~~~
billforsternz
Maybe in modern OS/CPU combos, but on a 70s era 8 bit CPU like the 6502 the
code as described above is definitely vulnerable to a hardware interrupt
overwriting the out of scope memory area in question before it is used. One
way to make it safe would be to disable interrupts, although non-maskable
interrupts would still be a problem.

However, although I haven't checked the details, I suspect that something much
cleverer does make the trick robust. What would the hardware interrupt service
routine overwrite the out of scope memory with ? For many CPUs, maybe for the
6502, it would be the return address from the interrupt service routine -
which is slightly different to the originally pushed return address, but still
on the same page, which is all that's required. So a little bit of Woz magic
perhaps.

------
anon4
Every programmer that uses a language with C's memory model (and preferably
absolutely every programmer) should know the following by heart:

    
    
        |-------| - max address
        |STACK ↓|
        |  SP   | ← stack pointer
        | ..... |
        | ..... |
        |HEAP  ↑|
        |-------|
        |other  |
        |mapped |
        |memory |
        |-------| - zero
    

Barring alternate memory managers (if you use one, you know the diagram and
are now writing a post why it's wrong), stack grows down, heap grows up. When
you call a function, SP is decremented by the total size of the stack
variables in the called function, the address of the next instruction in the
current function is written at SP, and each variable in the called function is
written at SP - x, where x is an offset calculated by the compiler. When the
function returns, the memory isn't cleared, the address of where we left off
the caller is read, SP is incremented to its previous value, and the processor
resumes from that point. The "push" and "pop" cpu instructions don't allocate
memory, they're just a shorthand to decrement SP and copy a value.

For a fun demonstration, compile this C code with -O0 (optimizations off, or
debug build in Visual Studio, IIRC):

    
    
        #include <stdio.h>
    
        int foo(int unused) {
            int a;
            return a;
        }
    
        int bar(int x) {
            int b;
            b = x;
            return b;
        }
    
        int main(int argc, char** argv) {
            printf("%d\n", foo());
            bar(10);
            printf("%d\n", foo());
            return 0;
        }

~~~
yongjik
Not always. Stacks can grow up in some CPUs.
[http://stackoverflow.com/questions/664744/what-is-the-
direct...](http://stackoverflow.com/questions/664744/what-is-the-direction-of-
stack-growth-in-most-modern-systems)

And then there's Itanium, which has _two_ separate stacks, one growing up and
one growing down...

------
zokier
Kinda, but not exactly, same kind of issue that is prevalent when learning
private/protected/public (or the equivalents). They do not actually prevent
code from outside accessing the bits you specify, like what you'd except if
you think them like being similar to file permission bits. Rather they are
hints for the type system and usually invisible at runtime.

In the same way, scopes do not actively prevent code from outside accessing
the variables within. Rather they are hints for the compiler that these
variables are associated with these bits of code, and usually on runtime they
are just organized in most efficient way rather than enforcing the scope
boundaries.

------
rayiner
Bonus question: can you get the example to print anything other than 5? The
fact that the example prints 5 is an artifact of having allocated "a" on the
stack. However, the compiler does not necessarily have to do that. If the
compiler can determine that the pointee of "p" is undefined, you should be
able to get it to print the value of whatever was previously in the register
allocated for the pointee of "p" (which will probably be RSI on an x86-64
machine since IIRC the "cout" statement gets turned into a two-argument
function call, the second of which is passed in RSI).

I'm not sure if any compiler will actually do this, though. They might just
see that the address of "a" is taken at some point and refuse to allocate it
to a register.

~~~
prasun
If you enable optimisation, you probably would see something other than 5.

------
TillE
Do people really find extended analogies like this helpful? If you have even a
vague understanding of how computers work, it just seems confusing to me.

~~~
mturmon
The pleasure for me in reading this extended analogy was making the parallels
between the analogy and the computer equivalent, as I read.

For example, "someone might have replaced the nightstand by an armoire" (e.g.,
what used to be a Bar object is now a Foo), and "someone might be tearing up
the book just as you walked in" (e.g., an asynchronous process may be
destroying the object piece by piece, concurrently with your own execution).

------
IgorPartola
My short explanation: every time you call a function, its vars get allocated
on the stack. When you return from the function, you "pop" the stack, but all
that means is that the stack pointer is now pointing to main. There is no
reason to actually clear the stack memory as that would be a waste of CPU
cycles. Therefore, if main() calls foo(), then foo() returns, the contents of
the variables of foo() is still one frame higher on the frame stack and given
the right memory address you can still access it.

As others point out, this is not something you should rely on. On the other
hand if you are trying to overflow the stack or somehow break the program,
this is definitely something to try.

The converse of this is that local variables of main() can be made accessibly
to foo():

    
    
        void foo(int *x) {
            printf("a = %d\n", *x);
        }
    
        void main() {
            int a = 12;
            foo(&a);
        }
    

This of course makes sense: when you are in the middle of foo(), main()'s
variables have to be stored somewhere and are accessible.

Edit: here's another fun way to get _a_ from foo():

    
    
        void foo() {
            int *x;
            x = (int *) (&x + sizeof(int));
            printf("a = %d\n", *x);
        }
    
        int main() {
            int a = 12;
            foo();
            return 0;
        }

~~~
T-hawk
_Therefore, if main() calls foo(), then foo() returns, the contents of the
variables of foo() is still one frame higher on the frame stack and given the
right memory address you can still access it._

Unless an interrupt occurs between returning from foo() and reading those
leftover stack variables. foo()'s abandoned stack space gets smashed by at
least the return address from the interrupt handler, plus anything the handler
itself pushes.

This falls into the class of use-after-free bugs. Like most instances of such,
the technique works until something makes it not do so.

~~~
IgorPartola
Great point! Yes, that's exactly what this is: use-after-free. That's why I am
saying that this is useful when you are trying to somehow break the program,
but not useful when you are doing constructive things. Presumably, when you
are trying to break the program, you can run it multiple times, and chances
are that at some point the interrupt will not happen.

------
deletes
No. The code in question is invoking undefined behavour, making the question
pointless.

~~~
mikeash
It is often instructive and useful to look at the behavior of real
implementations, not just the idealized behavior given by the standard.

~~~
IsTom
When compiled with -O3 using g++ it returns 0 8. Compiler _will_ use undefined
behavior to optimize.

~~~
mikeash
There's a fun example with clang where you can have two pointers where:

    
    
        x == y
    

But:

    
    
        *x != *y
    

It involves invoking undefined behavior by using a pointer after it's been
freed and arranging for a new pointer at the exact same location. Clang cached
the contents of the old pointer and uses it in the comparison.

------
a-priori
One thing it depends on, that I don't think anyone has mentioned, is the
compiler's register allocation logic and how that interacts with the rest of
the code in the calling function.

If, at the point you call the function, all registers are used, the following
statement to retrieve the value may cause a register to be spilled to the
stack. That may overwrite the data. That's more likely to happen on a
register-poor architecture like x86.

------
al2o3cr
Short answer: maybe, but you shouldn't depend on doing that. Most compilers
will even warn you about doing this exact thing, depending on their settings.

------
JoeAltmaier
The stack is a cheap/fast memory allocator. Upon return (out of scope for
local variables) its freed. Don't point to freed memory.

------
Nursie
My instincts are that it's in stack memory, not heap, so there won't
necessarily be a violation, and as you're not overwriting with anything (or
making a new stack frame), sure the memory's still there and unaltered.

You can write all over your stack if you like. It's a pretty bad idea though.

~~~
GeneralMayhem
Yeah, it looks to me like it should work up until you make another function
call. Of course, that's assuming the platonic ideal of a memory model, which
may not actually hold after the compiler, linker, loader, and memory paging
have all had their way with it.

------
fit2rule
Yes, if you're an idiot. No, if you're not.

EDIT: okay, I'm getting downvoted. But the truth of the matter is, its
_highly_ dependent on if the developer understands what they're doing. In both
the cases where the developer knows, or the developer doesn't know, either way
- accessing variables off the stack in this way, with a flying pointer, is
pretty idiotic. Its not going to result in great software, people ..

~~~
coldtea
You're getting downvoted because:

1) It's not a great insight in the first place. It's common sense and well
known that you shouldn't access data this way.

2) It doesn't answer the question, which is about the mechanics (how does it
happen?), not the quality of the programmer (what kind of programmer does
that?)

3) It's needlessly insulting.

4) It's plainly wrong. It might be "idiotic" to access memory this way, but
that doesn't mean that only idiots do it, either with intent (e.g to hack into
something, or speed up some code with pointer arithmetic), or without (e.g by
accident). It can happen (and it has happened) even to Dennis Ritchie.

~~~
redblacktree
Not Kernighan though. He'd never make a mistake like that.

