
Why Is My Perfectly Good Shellcode Not Working? Cache Coherency on MIPS and ARM - DyslexicAtheist
https://blog.senr.io/blog/why-is-my-perfectly-good-shellcode-not-working-cache-coherency-on-mips-and-arm
======
TickleSteve
"Shellcode" in this sense is the small number of instructions that initiates
the exploit, i.e. its contained in the data you overflow the buffer with.

The intention is that you tricking the firmware into executing your shellcode
to ramp up the exploit.

The shellcode is "executable data" and the typical prevention for this type of
exploit is execution-prevention on data (and stack) regions (for processors
that support it).

~~~
signa11
but there is always return-to-libc style attacks which defeat executable stack
protection...

~~~
saagarjha
And hence we have ASLR to prevent against _those_ attacks…

~~~
DyslexicAtheist
only for 64-bit, but disclaimers might apply in (32-bit) IoT land:

 _> Our results suggest that, for current 32-bit architectures, address-space
randomization is ineffective against the possibility of generic exploit code
for a single flaw; and brute force attacks can be efficient, and hence
effective_

[1]
[https://web.stanford.edu/~blp/papers/asrandom.pdf](https://web.stanford.edu/~blp/papers/asrandom.pdf)

edit: also ROP

~~~
saagarjha
32-bit IoT usually has a lot more problems that are easier to exploit than
being able to brute force ASLR, but yes: if you are able to run attacks
quickly, it’s pretty easy to get lucky and find the offset. There really isn’t
much you can do on 32 bit to fix this, unfortunately, unless you start doing
fancy things like reordering functions themselves to increase the number of
bits of entropy (and even then, you’re not getting much…)

~~~
wtallis
Since IoT is somewhat less tied to existing architectures, it's not completely
out of the question that somebody could produce a microcontroller that keeps
separate stacks for return addresses and function arguments/results. If the
former stack is not directly modifiable, that removes most of the need for
ASLR.

~~~
dfox
Major selling point of most modern microcontroller architectures is that they
don't have separate hardware return address stack, and are thus "C
compatible". Many early microcontrollers had exactly that as it also typically
means simpler hardware implementation.

~~~
wtallis
"C compatible" strikes me as a very overblown way of describing the
consequences of choosing whether to have a separate return address stack. In
practice, I don't see it actually amounting to anything more than a need to
use privileged instructions to implement setjmp, which is perfectly reasonable
for a high-security environment.

------
taneq
Note: This is talking about not talking about exploiting buffer overruns, not
about BASH scripts. I was very confused for a moment there.

~~~
rocqua
shellcode vs shell-code.

Shellcode traditionally meant binary that would do something like
exec("/bin/bash") (i.e. code that launches a shell). When abusing a buffer
overflow, you would usually want to somehow get this code to be executed.
Later, the term expanded to mean any kind of user-provided bytes an attacker
wants to be interperted and executed as machine instructions.

------
pm215
The post seems to be slightly confused about the Arm DSB and ISB instructions.
These don't affect the caches at all -- they are barriers, which affect
ordering and pipelining. DSB says "don't go any further until all memory
accesses made by instructions before this one have completed", and ISB says
"flush the pipeline so the next insn is reread from the cache" (among other
things). So self-modifying and other tricky code needs to consider these insn
ordering issues as well as cache manipulation, and barrier insns aren't a
substitute for doing the cache maintenance.

~~~
ajross
Pipeline control certainly does effect caches. That's where the memory access
being controlled goes. I mean, it's tempting to try to understand them as an
orthogonal thing, but in fact memory hierarchies and CPU pipelines (and other
stuff like store forward buffers) are tied at the hip and it's pointless to
try to "understand" them in isolation.

At best it's an exercise in specification pedantry. The only reason
instruction barriers, fences and serialization instructions seem to make sense
in docs is that someone at the architecture sat down and wrote a memory and
execution model that can be abstracted by whatever API it is they present to
the user.

And someone doing performance or exploitation work at the level of the linked
article really needs to understand that model, not the API. (Though you're
absolutely right that this particular article seems to be a bit mixed up about
both)

~~~
pm215
I meant that it doesn't affect it in the sense that it doesn't change the way
the cache behaves or ask it to flush or invalidate any of its contents. You
can have a pipelined CPU which needs the barrier insns but which has no cache
at all (most likely in embedded). And if the problem you're having is that
your data is in the dcache but not the icache then the barrier insns won't
help, because they're not architecturally defined to do that.

I certainly agree that you need to understand the architectural model to write
reliable (and cross-cpu portable code) -- and the architectural model for Arm
does distinguish barriers from cache maintenance -- though if you're writing
exploit shellcode then "works most of the time and we can't do better given
the way the bug we're exploiting to get code execution" is sometimes what
you're going to end up with, at which point knowing what the specific hardware
is doing can help.

------
classichasclass
The same principles probably apply to Power ISA (isync, hwsync, icbi, etc.),
and for that matter SPARC. Lots of PowerPC chips still turn up in embedded
applications.

------
vectorEQ
it's so perfect that it doesn't work. great stuff. instantly lost interest.

------
giancarlostoro
I had issues with my own shell script this past week or two. But my issue was
colliding hypervisors. Fun times. I'm not savvy enough with Bash as it is, it
was frustrating.

Edit: I guess my shell code breaking frustration is somehow irrelevant to this
issue. I mean running into the issue in the article is a rarity these days, at
least on standard / mainstream hardware.

~~~
matthewmacleod
Different kind of “shell” code maybe? :)

