
Circumventing the JVM's Bytecode Verifier - half-kh-hacker
https://anthony.som.codes/blog/2019-12-30-jvm-hackery-noverify/
======
nneonneo
Way back when I was messing with Android reverse engineering, there were
already a number of obfuscation/protection systems which screwed with Dalvik
VM internals. One particularly memorable one had a native library, written
using a completely incompatible ARM ABI (using the stack pointer as a normal
register, a different register for stack-like operations in the opposite
direction, using random registers and stack slots for arguments, etc.), whose
only job it was to patch the crap out of the Dalvik VM so it would load their
custom obfuscated VM bytecode. The main issue (and the reason this kind of
obfuscation seems to have gotten less popular) was that it depended extremely
heavily on Dalvik internal structure offsets, and had a massive table of
version-specific offsets and patch code which presumably became unmaintainable
with all the extant versions of Android.

Anyway, it’s fun to look at ways to obfuscate bytecode. It’s far too easy to
decompile unobfuscated Java code to pretty much perfect source code these days
(same goes for any .NET code) - you really do need a little bit of obfuscation
to prevent people from trivially stealing your code.

~~~
hinkley
I've heard similar tales about the JRE as well. Cross compilers and
obfuscators found many, many patterns that the spec 'allowed' but the runtime
did not.

I think the biggest tricks I know of in obfuscators is abuse of overloading
(calling all functions 'a' until you run into a collision on argument types,
then call those methods 'b'), which decompiled is still gibberish to most of
us. The nastiest one I saw was that someone realized that keywords and symbols
are reserved in Java but not in the JVM. So they started naming things "int"
or "{".

I wouldn't be surprised if that's now old hat for decompilers though.

~~~
nneonneo
Yeah calling everything “a” is pretty much the only thing Proguard (the
default Android obfuscator) really does that impacts reverse engineering in
any way. Most reverse engineers I know working with Android already have good
tools to deal with this (interactive renaming, very similar to refactor
renaming, works wonders; as does the fact that most Android code leaves class
and sometimes method names intact in logging and debug infrastructure!).

Renaming classes and methods into annoying names is something I see with .NET
much more frequently - one obfuscator I’ve seen basically uses different mixes
of Unicode spaces to name everything which is pretty cute. Unfortunately most
of these techniques are easy to work around as a reverse engineer.

~~~
atesti
Which are the best tools to reverse engineer Android Apps?

For me jd-gui was not working very well, it was not able to decompile many
long methods and just threw errors. Is there any actively developed JAR or
dalvik decompiler?

~~~
nneonneo
I used a mix of apktool (reading raw Smali code) and jd-gui for a while, but
recently transitioned mostly to using JADX:
[https://github.com/skylot/jadx](https://github.com/skylot/jadx)

JADX is really nice - there’s a decent GUI, and it also supports exporting the
whole thing as an Android Studio project so you can directly take advantage of
the refactor tools there for more deobfuscation. It also has a smarter
decompiler than JD as it’s specialized to go straight from dex to Java (JD
makes a trip through the classic JVM class format, which loses some nuance of
dex).

Of course, many of the apps I’m taking apart have native components, for which
I use GHIDRA and IDA.

------
joshstrange
This is a bit over my head and I'm out of the JVM world now but I found it all
very interesting and well written. One small thing I'd suggest is to change
how footnotes are done. I was very confused when I hit the first footnote "1"
but there was no title text and it wasn't a link so I just scrolled to the
bottom of the article and saw "1\. Employing this technique seems to work on
Windows and Linux..." and I was very confused as that had nothing to do with
"three million devices.". The "1: n = 3,000,000; ..." in the next paragraph
didn't register when reading through. I thought it might be a "pull
quote"-type thing that would be explained below. There are about a million
ways to accomplish footnotes so I won't try and say which is best but some
color difference (both footnote number and then the text for the footnote)
might be helpful in making it more obvious and seems like an easy change.

Again, great article and I really don't meant to nitpick, the footnotes just
confused me a little.

------
peter_d_sherman
A must-read for anyone implementing security on a language specific VM or
runtime environment, in the future...

~~~
saagarjha
TL;DR if you give people a RW primitive out of the gate you’re going to have a
bad time.

------
Izmaki
You explained the concept like a senior with tens of years of experience.
First year at university... dang. Respect.

~~~
half-kh-hacker
To be fair, this is three-and-a-bit years of experience in this area, so it's
not like I've learned this in my eight weeks of lectures...

