Reverse engineering machine code is not trivial if it is obfuscated, has its sym...

dymk · on Sept 24, 2015

Symbols are obfuscated or removed entirely if you choose to build without them; you don't have to hand Apple your .dSYM if you don't want. I'm not sure what other metadata you're referring to.

nekitamo · on Sept 24, 2015

I am referring to ObjectiveC metadata, which programs like IDA Pro can use to identify function names/parameters, which give valuable insights to reverse engineers.

With LLVM bitcode, it doesn't matter if you don't hand over your .dSYM file to Apple, all the symbol information (and more!) is available in the bitcode.

tptacek · on Sept 24, 2015

There is metadata in a fat binary (or in bitcode) that makes reversing more convenient, but there is none that makes reversing possible; reversing raw ARM assembly spat out by a normal compiler backend is always possible.

It's 2015. Lack of source code is no longer a meaningful obstacle for attackers.

nekitamo · on Sept 24, 2015

I agree, reverse engineering is always possible :) However, reverse engineering with the LLVM bitcode present is much much easier, which is a problem.

Also, if Apple requires apps to be submitted as LLVM bitcode, then certain things like integrity checks and certain useful types of obfuscation will become impossible, making apps much easier to reverse engineer, or to pirate and repackage on another app store.

KateLawson · on Sept 25, 2015

I disagree that LLVM bitcode makes reverse-engineering any easier on iOS. You can do everything in bitcode that you can do in native ARM code. The limitations are already there in terms of code signing, sandboxing, and page protection and bitcode distribution doesn't change any of these OS attributes.

Are you concerned about Apple reversing your code? You can still use arbitrary types (e.g. int64 instead of a pointer) and give them no additional data about the structure of your code. As you pointed out elsewhere, tools like OLLVM perform obfuscation at the bitcode level already.

I'm guessing Apple made this change to make it easier to do program analysis of iOS apps being published. You can certainly find bugs easier in bitcode that has proper type information and clear differentiation between safe and unsafe branches. One of the first steps in program analysis tools is to lift native code to an IR in order to determine its correctness.

With bitcode distribution, Apple has potentially made it easier on themselves to skip the lifting and type recovery steps for conforming apps in order to look for bugs. But unless they start requiring bitcode conform to certain additional standards, you can always transform bitcode to obfuscated bitcode, destroy type information via aliasing, etc.

nekitamo · on Sept 25, 2015

I concede that LLVM bitcode can be obfuscated just like machine code ( with enough effort, and LLVM bugs aside ) . However, you're still losing the ability to self-checksum your code, which in turn means you can't protect against things like piracy or unwanted modification of your binary.

I agree also that Apple probably made this change to better analyze programs being submitted to the app store. That and to recompile programs to use intrinsics more efficiently, as new intrinsics become available.

And finally, while I would not be concerned about Apple reversing my code, certain companies are and have to undertake steps to make that as difficult as possible.

KateLawson · on Sept 25, 2015

Yes, self-checksumming becomes impossible [1].

If you are concerned about anyone reversing your code (Apple or otherwise), you must obfuscate it. Either you work at the ARM level or the bitcode level (watchOS for now), but the basic techniques are the same. Or, you can avoid changing tools by doing source-level transformation.

One of the companies probably impacted by this change is Arxan or other obfuscators. They have to change both front and backend to be LLVM -> LLVM.

[1] Except for the case I mentioned before, where you predict the generated code for known architectures and use that to generate your constants. This still breaks if Apple generates code for a new architecture without your help.

tptacek · on Sept 25, 2015

I dispute the claim that bitcode makes this task so much easier that it presents a real threat. Everyone with a financial or policy interest in backdooring Signal can do so on stripped ARM binaries without any trouble.

nekitamo · on Sept 25, 2015

Iirc Signal is open source, so anyone can simply insert a backdoor into the source code and compile it, and then distribute the modified binary.

However, I'm talking about applications for who the source code is not available. If you properly use integrity checks, signature verification, and obfuscation, you can make repackaging and piracy of applications significantly more difficult. With LLVM bitcode this is no longer the case.

KateLawson · on Sept 25, 2015

There are already ObjC selector obfuscators that generate a random secret and hash all program values with that on each build.

https://github.com/Polidea/ios-class-guard

nekitamo · on Sept 25, 2015

Thank you, I had forgotten about class guard. It's a great tool, but I recall having problems getting it to work on larger applications.

It's also possible for you to remove the ObjectiveC metadata from a binary entirely, obfuscate/encrypt it, and then add it back to the ObjectiveC runtime as needed. With LLVM bitcode this becomes much more difficult (but maybe not impossible...)

Joky · on Sept 25, 2015

You should really look at the bitcode that is actually submitted. It seems your opinion is based on what clang generates usually.

nekitamo · on Sept 25, 2015

I am actually looking at that bitcode right now :) You can use this program ( https://github.com/AlexDenisov/bitcode_retriever ) to extract it from the MachO file, then use Xarc to decompress the archive, then use llvm-dis to disassemble the bitcode. You will see it's chalk full of information that is otherwise not available in the machine code.

Joky · on Oct 2, 2015

Which MachO file are you looking at? Is it the one generated by the compiler or the one after linking in the bundle submitted to the store?

tptacek · on Sept 24, 2015

Symbols don't matter at all.

Meanwhile: LLVM's ARM backend is not an obfuscating compiler backend.

If you are obfuscating your code, yes, Bitcode is a problem. But Signal is not obfuscated.

nekitamo · on Sept 24, 2015

Sorry, there might be some confusion.

This entire time I was referring to code obfuscation/integrity checks, and why bitcode is a problem for people who want to obfuscate their code. I was not referring to apps like Signal at all.

As an interesting aside, LLVM's ARM backend can be made into an obfuscating compiler, and furthermore some people have started obfuscating the LLVM bitcode itself: https://github.com/obfuscator-llvm/obfuscator/wiki http://0vercl0k.tuxfamily.org/bl0g/?p=260

tptacek · on Sept 25, 2015

Yes, if you were going to write an obfuscator, LLVM would be a good place to start. But, to a first approximation, nobody in the world obfuscates their mobile apps.

Actually, I should raise the 'NateLawson alarm; he can probably tell us precisely how many mobile developers obfuscate on either platform.

KateLawson · on Sept 25, 2015

Good question. Obfuscation is more common on Android (10% of apps) than iOS (1%), but it's also pretty interesting where it's used most.

First, throw out Proguard on Android. That's not really obfuscation, it's an optimizer. Sure it renames variables and removes dead code, but that's only a problem if you have a naive system that relies on class & method names. Proguard is used in 20% of Android apps though.

The most common issue with Android is the malleability of its bytecode it inherited from Java. It's common to patch apps or repackage them for other app stores (after swapping advertising API keys). There's no work required to rebase addresses or anything after patching, unlike native code. This drives the need for obfuscation, but also provides a useful springboard for implementing it.

We've found that many of the obfuscation schemes on Android are custom. This is because of the liberal policy towards using native code via JNI and the ease of inserting a custom Java classloader. With iOS's code signing and memory protection, you can't write a custom loader in your app that decrypts and executes new code.

However, every other obfuscation measure is available in iOS. You can build opaque predicates, mangle control-flow, do arbitrary pointer arithmetic, and lots of things that are impossible in Dalvik. (However, you can just write a thin Java wrapper around your .so in Android and you've got the same capabilities and more.)

With LLVM bitcode distribution, you're really potentially losing only one thing in your obfuscation toolbox: self-checksumming. You couldn't due self-modifying code already due to code signing, so nothing changed there.

While I haven't implemented it myself, I believe you could still even do self-checksumming via clever use of intrinsics. That is, you carefully lay out the instructions and data and predict what you'll expect to see when it's translated to armv7, v7s, arm64, etc. That's the value you'll check for.

http://llvm.org/docs/ExtendingLLVM.html

nekitamo · on Sept 25, 2015

Actually, before LLVM bitcode, self-checksumming was quite possible on iOS, and is widely used by many commercial obfuscators. Furthermore, self-checksumming is the root of all good anti-tamper schemes. It prevents someone from automatically patching out your license checks/anti debug/anti jailbreak/whatever. Without self-checksumming, someone can easily patch your app to pirate it, inject a backdoor, or modify it in some unwanted way. So it's really a big loss if you care about those kinds of things.

Self-checksumming via intrinsics and clever layouts is a very interesting idea, but it's impossible for you to checksum a program when you're not 100% sure what it will look like after compilation. For example, even if you manage to correctly 'guess' what machine code your LLVM bitcode would be translated to and adjusted your checksums accordingly, Apple could just update their LLVM bitcode compiler at some point in the future, and invalidate your checksums.

Also Android is a much more interesting platform than iOS, because it allows for self-modifying code, which various commercial obfuscators use in various interesting ways :) Also there are very interesting things you can do with reflection or the JNI in Java.

KateLawson · on Sept 25, 2015

I agree re: checksumming.

Regarding Android's flexibility, I've written an obfuscator (on another Java-based platform) that took a DSL and generated half the code in Java and half in C, intertwining the computation between the two processors via IPC.

nekitamo · on Sept 25, 2015

The majority of people might not obfuscate their mobile applications, but I think that's a problem in tooling and awareness. A lot more would if they could, and if they knew how.

That being said, there is also a very sizable minority of companies that do obfuscate their mobile applications.