

Stack-checking a program... that will execute in orbit - ttsiodras
http://users.softlab.ece.ntua.gr/~ttsiod/stackusage.html

======
mstevens
These arguments never quite make sense to me.

The reasoning appears to be "since technique X can cause problems in some
circumstances, it can never be used in software for domain Y (usually
avionics, space, or similar)".

However, obviously _every_ technique could cause problems in some
circumstances. Therefore I conclude it's impossible to ship software into
space.

Since software clearly is shipped into space, it seems likely there is some
problem with the above argument. My suspect is the "could cause problems?
Can't use it!" line of reasoning is oversimplified for the purposes of
explanation, and _hopefully_ something better justified is actually occurring.

~~~
Someone
I would think the reasoning would be "pick the low-hanging fruit to improve
reliability".

Checking a programmer's claim that a program will not run out of memory is
difficult (sometimes impossible) if you allow heap allocations and arbitrary
stack usage. You could allow either, if the programmer supplies a proof that
things are OK. But with such a proof in hand, writing the thong so that it can
be mechanically checked is "just work" (tedious work, possibly, but for this
application, that is not that important). Because of that, why risk that such
a proof is flawed (or becomes flawed after a bug fix), if one can just as well
prevent the problem?

Of course, the checking code will be buggy, too. That is OK; it will be
written by a separate team, and will improve over time, just as the code it
checks does.

I guess that, even with these checks, the program will still have some
watchdog timer, a way to remotely reset it, and a way to upload new software,
just in case it runs wild.

~~~
ttsiodras
> I guess that, even with these checks, the program will still have some
> watchdog timer, a way to remotely reset it, and a way to upload new
> software, just in case it runs wild.

Indeed.

As for the reasoning: couldn't have said it better myself.

------
jmaygarden
I would note that determining maximum stack usage is similar to the halting
problem in some cases. One must carefully restrict how the program is
structured for this analysis to be valid. As is stated, any sort of recursion
or mutually dependent functions are out, and the use of function pointers
could also cause issues.

This approach provides a good baseline. However, I'd highly recommend
empirical testing as well. For example, create a sufficiently large stack,
initialize the stack with a known bit pattern and then check the high water
mark after fully exercising the program.

~~~
eru
I agree with your first two sentences.

Why should recursion be out? You just need to be careful with it. Not having
an infinite recursion is basically the same problem as not having an infinite
loop.

~~~
rgoddard
Since with recursion you can have a large but finite number of calls that
could blow the stack, but not have the same problem with a for loop.

~~~
eru
Yes, you have to account for that. Also, tail recursion should be fine, if
your compiler is smart enough. (I guess since the program is looking at
assembly, it couldn't tell optimized tail recursion from a loop.)

------
jimrandomh
Static analysis to prove that code doesn't run out of memory is cool. However,
completely refusing to use heap allocation, because of the possibility that
some heap-using code will run out of memory, is completely insane. The
ridiculous things you'll have to do with data structures to avoid using the
heap will introduce more bugs than this prevents.

~~~
ttsiodras
You can try convincing space companies that they are in error; good luck with
that :-)

------
ajross
Not to burst any bubbles, but the static analysis technique shown is capable
of detecting recursion only in static call chains. It doesn't seem to even
have any handling (even to say "warning! unhandled call!") for something like
function pointers. Feed this a C++ program with recursive virtual functions
and it will, apparently, return "OK!"

~~~
ttsiodras
Actually, function pointers are... not allowed in space SW.

And remember, this is just a hack I did - I wanted to show people the things
one can do with objdump. From the article:

"This kind of functionality is usually part of far more mature tools than...
quickly-hacked Python scripts. Don't use this to send your spaceship in orbit
:-)"

~~~
ajross
Still misses the point. A quickly hacked script can be really useful if it
shows real bugs (where a recursive call in this case is a "bug"). There's
nothing wrong with the development methodology: who cares how ugly the code is
that found the bug as long as it's fixed?

The problem is that the mathematical problem being attacked is undecidable.
This script only approximates a solution in the case where function pointers
don't exist. Even _that_ would be OK if you knew about the limitation and the
script flagged indirect jumps as errors too. It looked like you didn't realize
that though, so I thought it had to be mentioned.

~~~
eru
You could send in a patch.

~~~
ajross
... that solves the halting problem?

~~~
wlievens
You're using the Turing Tarpit here. You do not need to "solve the Halting
Problem" in order to make useful static analysis software.

When the software can prove X, return True.

When the software can prove !X, return False.

When the software can't do either, return Sorry.

------
zwieback
I generally also trace stack usage by filling the stack with a fixed pattern
and watching it in my trace probe.

~~~
wlievens
Yes, but that's a form of dynamic analysis. To prove its applicability, you
need to run it for every possible input (or a reduction that comes down the
same thing).

------
sausagefeet
Writing code to execute in orbit seems interesting. Do you have any further
suggested reading?

~~~
ttsiodras
MISRA-C, for starters.

Developing safety-critical code is governed by extensive guidelines (like
DO178-B) which dictate lots of things, not just about code (process of
development, testing, etc)

------
zandorg
I just rewrite the stack function to use heap (eg, a queue) instead of stack.

~~~
ajross
The application in question disallows heap usage. In "no runtime errors
allowed" environment like this, everything is pre-allocated. This is easy for
data structures (though it gets hairy once the usages start overlapping). This
including the stack, which is hard to prove bounds for with traditional tools.
So the author wrote some scripts to parse the disassembly.

~~~
icefox
What about having the heap allocation actually use the stack?

~~~
rcfox
That kind of defeats the purpose. Heap allocation is meant for objects that
persist for the long-term. Stack allocation is meant for short-term storage,
and anything that is allocated on the stack is lost when the function returns.

~~~
apaprocki
You can always allocate stack space (static or dynamic with alloca()) in
main() before kicking off the application.

~~~
ajross
In which case it's isomorphic to a static memory block (albeit one with non-
zero initialization costs), so what would the point be?

~~~
icefox
Well the whole article seemed silly after he said this:

"First of all, you can't use heap - not when you're in orbit. Why? Because
quite simply, if you do use heap, and malloc returns null... what do you do?
You can't ask the user for help"

This is the OOM problem and is highly dependent upon the kernel you are using.
Not bringing the kernel into the discussion means that the discussion is just
silly. There is not much of a point to putting the heap in the stack as you
said, but hey like the author we are just getting some ideas out there without
bringing up the kernel.

~~~
ttsiodras
I think you misinterpreted the article: the target is not Linux/x86, its a
custom board with the SPARC/Leon processor running a custom embedded OS
(RTEMS), launched in space :-)

The article is just an example of the notions used, and how objdump can be
used to enforce them. Linux is used simply for demonstration purposes.

In your average PC with Linux/x86, it is clear that you fear neither stack nor
heap error - you just restart the app :-)

~~~
icefox
I think you misinterpreted my comment. I didn't mention Linux anywhere in it,
but specifically talk about how when dealing with OOM you must first discuss
what the kernel provides. On RTEMS can malloc actually return 0 or does it
never return 0 and there is a signal or some other event that is sent out from
the kernel? These are _the_ most important bits of information in this
problem.

