
Using GCC's Stack Smashing Protector on Microcontrollers - antoinealb
http://antoinealb.net/programming/2016/06/01/stack-smashing-protector-on-microcontrollers.html
======
TickleSteve
There are other features you can use if you're willing to accept a bit of
overhead in the name of robustness, namely the code instrumentation feature
('-finstrument-functions')

Using that option, you can provide relatively strong stack protection and even
a permissions system to restrict parts of your code from accessing others.

For even more robustness, you can do random stack offsetting, return value
checking, return opcode checking, per-task stack bounds checking, heap
metadata protection, heap bounds-check, etc.

All this does come at a price tho, every function is instrumented on entry and
exit so its not the most performant. It does provide pretty good protection
against buffer and stack smashing exploits.

Also, since this is an embedded system with a well-defined memory map, you can
perform very strong pointer-checks (not simply NULL checks) against exactly
the allowed ranges (a simple NULL check has a 1 in 4.2 billion chance of
catching an issue).

~~~
JoachimSchipper
Do you have a writeup somewhere? And/or a rough estimate of the overhead of
your scheme?

~~~
TickleSteve
I have docs... but not in a publishable format currently. I have been
considering doing a series of posts about the techniques tho as they are
widely applicable and do significantly increase robustness.

Currently you do pay a performance (and hence power-usage) overhead of ~5%
depending on how many 'domains' you wish to protect but that can be optimised
and tuned to your particular application.

Certainly for some applications the robustness and protection from malicious
message payloads is well worth the performance hit.

------
codys
It's really nice to have these things written out, the interfaces used by gcc
options that insert extra code/calls are typically undocumented (except for
their source code).

kasan, asan, and ubsan are similarly lacking in docs (but are also composed in
much the same way). I imagine some of the profiling hooks are done similarly,
but I have not yet looked into how to use them.

It'd be really nice if gcc considered these semi-public interfaces that they
should document.

> I plan to add other other debugging features in the following weeks. I have
> an idea on how to use the MPU to prevent thread stack overflows (different
> from the buffer stack overflow we explored today).

On that note, one can actually abuse the debug hardware on cortex-m3 chips
without an MPU to add stack overflow protection. Simply need some hooks into
the scheduler around where it switches tasks (some already poison the top of
their stacks & check for the poison on switching, the keep-out can typically
be applied very close to these locations).

The debug hardware takes a base+mask, so one can theoretically block access to
wider spans of memory that a single u32, but at that point calculating the
best way to cover that memory range becomes a bit difficult.

~~~
TickleSteve
Curious to know about the debug-abuse trick, I've implemented an MPU-based
stack protection scheme that involves poisoning the return address to trap
returns and implement a 'sliding' stack window.

It does work, but due to the limitations of Cortex MPU windows I've moved over
to code instrumentation for our protection which involves a very much beefed-
up version of the GCC stack protector.

~~~
codys
I basically was using the Debug Watchpoint and Trace (DWT) block of the armv7m
architecture (the ARMv7-M architecture reference manual specifies it).

DWT includes "comparators" typically used as hardware watch points. My
methodology was to use one of the DWT comparators to watch for something
reading/writing the last u32 of the current task's stack. Note that this might
not be perfect as it could be possible for the stack to somehow skip over the
last u32.

Then, install a handler for the Debug Monitor exception (which triggers when a
watch point is hit) to disable the DWT comparator, print/store the appropriate
info, and finally reset the chip.

Essentially, we're behaving like a simple "debug monitor" which only is
capable of setting a single hardware watch point automatically.

Note that DWT is technically an optional component in armv7m, so it may not be
functional on every chip. That said, most folks like having debug capabilities
(this is the same hardware that jtag debuggers use to set watchpoints), so
most seem to include them. On that note: doing this can (to some extent)
interfere with external debuggers. To mitigate some of that, I did try to use
the last DWT comparator instead of the first, but this will really depend on
how your debugger works.

~~~
_fs
I implemented something similar on a coldfire cpu. I was able to protect
against the skip over problem you describe by combining the hardware debug
point with -fstack-check to create a "canary" zone at the bottom of every
stack. This prevents a huge buffer that goes past the bottom of the stack
(which would not normally hit your single watchpoint) from overwriting code
far outside of the stack space.

~~~
TickleSteve
yes, that would work for that.

In my case, with my domain protection system I had well defined domain-
crossing points across which I could guarantee that no data was being passed-
by-reference, At these points I was trying to fit the Cortex MPU windows as
closely as possible over the 'used' stack area so I could prevent access to it
by any potentially malicious code. I would poison the return address to cause
a known fault when the code returned and move the MPU windows accordingly.

Although it worked, due to the limitations of the v7-M arch MPU windows the
protection was never complete and proved to be time consuming to implement due
to the size of the application. Maybe in future it will become more practical.

------
brandmeyer
One obvious gotcha with this technique is that a typical attacker will have
full access to the assembly code of the target under attack. Therefore, checks
against a constant will not be very useful. Fortunately, some microcontrollers
are equipped with a hardware random number generator. You can initialize the
value of __stack_chk_guard with a function that has
__attribute__((constructor)) and does not itself use memory in a way that
would cause GCC to issue the stack protector. That way the canary value will
be initialized during the startup code before entering main().

~~~
TickleSteve
the check is also quite weak in that its only really protecting a single
point. Its quite easy to stride "over" the protected area rather than going
"through" it.

TBH, it will only ever protect against accidental errors, not malicious
intent.

~~~
_fs
True, but you can use -fstack-check with a well-defined canary space and
correct probe interval to prevent the jump over attack you are describing.

