Hacker News new | past | comments | ask | show | jobs | submit login
Reverse Engineering Snapchat: Obfuscation Techniques (hot3eed.github.io)
497 points by 3eed on June 17, 2020 | hide | past | favorite | 167 comments



I’m surprised that no one has mentioned how this is actually accomplished. The answer is: largely automatically, at the compiler level.

Snapchat acquired Obfuscator-LLVM and the people behind it in 2017, which was actually partially open source for a period of time. It is a compiler backend for LLVM that obfuscates your code for you. You can read a bit about some of the techniques used on their old wiki:

https://github.com/obfuscator-llvm/obfuscator/wiki/Features (outdated)

https://www.bloomberg.com/news/articles/2017-07-21/snap-hire...


Funny thing about things like that is that you can likely write tools to automatically deobfuscate, if you know the mechanisms. Of course, this takes time and effort, and is beyond most spammers' capabilities.


I'm gonna write about this in pt. 2. Basically you can use symbolic execution to recover the CFG[1] (using something like miasm), you can eliminate dead code, restore dynamic lib calls with an emulation, and whatever else. But the point is that it would take an incredible amount of work and co-operation between tools, and then you wouldn't have even begun understanding anything about the binary, which is a whole another story. Now there's a kind of a little shortcut to all of this, which when combined with a couple of tools, you'd be able to make sense of things in this binary, which I'm gonna reveal in my next post.

[1]: https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm...


awesome write up, really engaging! I enjoy the cliff hanger at the last line... "one strange trick"....


"Evan Spiegel Hates this Trick!"


Most obfuscation techniques are lossy. You lose information such as project structure, names of files, data types, variable names and so on. Decompilation and deobfuscation might give you a shadow of the original source code but the benefits are overstated because the advantages over working directly with assembly code aren't that big. Most of the time is spent finding the dozen relevant functions out of 10000. If you truly need access to the entire source code your time is better spent on an opensource project.


> You lose information such as project structure, names of files, data types, variable names and so on.

You lose half of those by not having debugging symbols and the other half by stripping the binary. This is all lost during compilation already, not due to explicit obfuscation. If you've ever worked with a compiler that is mediocre at generating debug symbols, you'll know it's the compiler doing extra work that provides all these, not obfuscation that removes them.


Couldn't agree more.


That works if the obfuscating patterns are all straightforward like a regular grammar. But if it's not possible to distinguish an obfuscation from genuine code, that could quickly become intractable (NP).


Generally obfuscated code is easy to spot, if not easy to reverse.


Very unlikely you can actually. It is kinda similar to why we cannot have the source of binary even if we know how the compiler works.


We cannot have _the_ source, but we can have a good enough approximation of it, especially if a human is in the loop (see: commercial decompilation software like the Hex-Rays decompiler, Binary Ninja, and even Ghidra).


The point is that we cannot automate reversing these obfuscation mechanisms the same way we cannot automate reversing a binary file to a higher level than assembly.


This not quite true, especially with current state-of-the-art tools like Ghidra, IDA pro (with Hex-rays), etc.

In fact, Rolf Rolles wrote a wonderful guest post[1] for the Hex-Rays blog about automating the reversal of this exact obfusactor, though he wasnt aware of it's origins at the time.

[1]: https://www.hex-rays.com/blog/hex-rays-microcode-api-vs-obfu...


All these are great programs, but none of them can understand that level of obfuscation so far. As stated in the post, both Ghidra and IDA interpret the very first block in any of the obfuscated functions, which ends with an indirect branch, as a complete function in and of its own. Because this is the usual case, indirect branches AKA tail calls terminate a function to start another, all with the same stack frame.

EDIT: also keep in mind the CFG isn't flattened here.


I think the idea is that Ghidra's and IDA's plugin systems allow for manipulation of binaries at a level that allows writing deobfuscators over them.


inside developer console:

Array.from(document.images).forEach(img => console.log(img.src= img.src.replace("http://hexblog.com", "https://hex-rays.com")));

to make the blog readable


Exactly. Such tools are definitely possible, even if they rely on Ghidra or IDA's plugin systems.

What I like is the economics of the idea that one company can build an obfuscator, and then another company can build an anti-obfuscator which completely nullifies the value proposition of the first company.


This is an awesome write-up; I’m shocked at the level of effort that went into Snap’s obfuscation process. It implies that are entire teams of engineers out there whose sole job it is to play cat&mouse with reverse engineers and nothing more. Another comment mentioned that this effort is outsourced, so not only are there teams, but entire companies dedicated to this!

What a blast that must be... though the immense amount of [invested|wasted] (take your pick depending on cynicism) effort spent on this game makes me a little sad. All of these brilliant minds just... cosplaying Sisyphus?


>What a blast that must be... though the immense amount of [invested|wasted] (take your pick depending on cynicism) effort spent on this game makes me a little sad. All of these brilliant minds just... cosplaying Sisyphus?

And we wonder why such a high % of tech workers have a deep discontent & are desperately searching for meaning.


I would find that a very fulfilling and meaningful project, personally. I'd actually consider it way more fulfilling than working on the core product, which likely mostly involves trying to think of and implement clever ways to expose users to ads and sponsored content, and otherwise try to directly and indirectly monetize users.

Here, the goal is to prevent phishers, fraudsters, scammers, spammers, catfish, impersonators, malware spreaders, etc. from running amok in a somewhat unprecedented way by tricking users en masse into thinking they're really receiving photos/videos in real-time, using automated tooling. My understanding is this heavy degree of obfuscation (combined with other anti-tampering tactics) has gone a very long way to mitigate a huge amount of abuse.

From talking to people who've tried to bypass these mechanisms to do unauthorized and potentially risky things (like send things from a custom client in a way that could allow for mass automation), they describe this as an essentially intractable hurdle from their perspective. Of course, it isn't in actuality, but it is for most people when compared to lots of other social media apps, and I expect Snap to change things around not long after OP releases part 2. Cat-and-mouse never ends.


> Here, the goal is to prevent phishers, fraudsters, scammers, spammers, catfish, impersonators, malware spreaders, etc. from running amok in a somewhat unprecedented way by tricking users en masse into thinking they're really receiving photos/videos in real-time, using automated tooling. My understanding is this heavy degree of obfuscation (combined with other anti-tampering tactics) has gone a very long way to mitigate a huge amount of abuse.

Which is STILL in the service of trying to expose users to ads and sponsored content.

I find it sad that people in our industry are so easily distracted by the technical challenge du jour without looking at the bigger picture of what their work is in service of, which was OP's point.


>Which is STILL in the service of trying to expose users to ads and sponsored content.

I agree with you; hence one of many reasons why I personally wouldn't want to work at Snap, for example. I guess just relative to the other things going on there, this at least is for a good cause at the object level, so if I were somehow forced to work there, I'd probably prefer this over product development, and, more importantly, I'd consider the goal of it a lot more worthwhile and good.

>I find it sad that people in our industry are so easily distracted by the technical challenge du jour without looking at the bigger picture of what their work is in service of, which was OP's point.

No, I was specifically disagreeing with OP's point: I was saying the meaning comes from preventing the abuse, rather than the enjoyment of the tech parts. The technical challenge justification was what I was trying to counter, though I maybe didn't make it clear enough. It's not about the tech, but the bigger picture of what the tech is in service of, even if that particular bigger picture is smaller than the overall big picture of the app and company as a whole.

That is, preventing malevolent people, and, in many cases, criminals, from exploiting, harassing, stealing from, and abusing users (many of whom are very young) in various ways. I think even if it were a company that was a million times less ethical, that'd still be a worthy thing to do, given that the company is probably going to exist and have lots of potentially vulnerable users either way.

Of course, in the grand scheme of things, you're still helping the corporation and keeping it in existence, yes. But I also don't think they're some dystopian corporation or something in this case. I myself personally do very deeply hate advertising, advertisements, adtech, whatever, you name it, but your phrasing of "what their work is in service of" makes it sound like Monsanto or something. They're not even anywhere near Facebook's level of badness (as far I'm aware, at least).

They make a fun app with a fun new communication paradigm that lots of people enjoy using, and they're trying to monetize it with ads. I'm not a fan of the app or the business model, but there are tons of way worse things in the world.


Are you saying you shouldn’t help an app that exposes users to ads prevent people from running automated fishing-for-nudes campaigns that have been used in the past to bully teenagers into suicide?


No, not this way. If my goal is really to push content to users I’d get a set of account and automate the sending/scraping using a device emulator. Ultimately their efforts are better spent elsewhere if this is the actual goal. In reality the goal is to attempt to keep people from reverse engineering the api so they can create a custom, ad free, client.


> though the immense amount of [invested|wasted] (take your pick depending on cynicism) effort spent on this game makes me a little sad

that's an odd position to take. You seem to be ignoring the philosophy behind the cat&mouse game that is RE (and Security Engineering in general). What you call cosplaying Sisyphus is to me one of the most rewarding aspects of Tech. Breaking things especially is fun when somebody has made an effort to lock things down (and maybe even claimed it's "unhackable"). This is an area where you're still paid to solve puzzles and where taking the long-view matters. RE is complex and hard but exactly because of this it's one of the most rewarding things in all of CompSci.


I assume OP meant the securing side, not the RE side.


I can't help but wonder if it's more of a "a little from Column A, a little from Column B" scenario.

There's no doubt they have skilled security staff, but - as a company overall - they also grew very quickly.

How much of that obfuscation is intentional and how much might just be old code from a few years ago that nobody got around to removing? Before it was passed through obfuscation.


The vast majority of these obfuscations (maybe except for the scratch arguments one) are done as LLVM passes, so it's done post-code writing, writing code like this would be unreadable and unmaintainable.


I'd say they make it a priority to keep people from tampering with their code, and maybe maintaining the platform's integrity. They even ban people who use tweak on jailbroken iPhone/Android. I found these articles about avoid Snapchat detection a while ago, it's a cat and mouse game.

https://aeonlucid.com/Snapchat-detection-on-Android/

https://aeonlucid.com/Snapchat-detection-on-iOS/


> It implies that are entire teams of engineers out there whose sole job it is to play cat&mouse with reverse engineers and nothing more

Do you think the same thing of anti-spam teams too? This is pretty much just anti-spam/anti-abuse.


Personally I don’t see these as the same. These attempts to prevent RE are mostly moot as the machine needs to interpret the code, so therefore it must be valid code and some of this either bloats the binary or decreases performance. The user pays for these inefficiencies (and are therefore user hostile) and in the case of battery powered devices incur additional costs through premature battery wear.


Snap is not usually something the user scrolls through all day (akin to FB/Insta infinite scroll); it's typically send a message or nude/recieve a message or nude and then backgrounded.

There is a rare case where people watch clickbait ads for hours but that's usually plugged in laying in bed with nothing better to do


I very much beg to differ. Spend some time around the younger generations and you’ll see that you will have a hard time getting them to look away long enough to realize you’re even there.


I am one of them - Snap is generally not a "browse forever nonstop" engagement tool that's typical and standard behaviour for TikTok/FB/IG, it's actually somewhat used as a messaging tool (not that it precludes an excessively high volume of messages and groupchats)


Well then I guess the others I see non stop using it must be figments of my imagination then and that somehow this all means obfuscation is now efficient. Also go search google, a 2 second search disproves you.


There are commercial products that largely automate these techniques through metadata. See Arxan.


Snap spent an awful lot of money on the facial recognition tech they acquired. I’d imagine the investment was somewhat worth it even it only slowed down competitors time to market.


Some I see are surprised to see the level of obfuscation used in the application. Many pointed, many ingredients for the obfuscation used in the app are off-the-shelf and few of them can be said to be well known in the industry, but still there is a cost in integrating them into a product. Obfuscation is notorious in breaking things which should work normally (normal compilation process) and as a own goal making it hard to debug as well. Integrating, testing, debugging and difficulty in debugging production crash logs is a considerable cost.

That said, obfuscation is increasingly being used in mobile applications now. Check your banking application or some government applications, you will find obfuscation being used. With mobile applications getting richer and lot of code executing on the client side, makes it compelling case to secure applications by using obfuscation (as a defense-in-depth approach).

Open standards like OWASP MSTG [1] MSTG-RESILIENCE-9 recommend such approach.

  Obfuscation is applied to programmatic defenses, which in turn impede de-obfuscation via dynamic analysis.

[1] https://github.com/OWASP/owasp-masvs/blob/master/Document/0x...


I think that it is due to the copy cats that keep stealing apps and repacking them.

Most Android developers lack native coding experience, so after failing attempts to protect their applications with the DEX bytecodes obfuscator, they think that recoding parts of the application with the NDK will save them.

However as this article shows, and most here know, they shortly learn that against good attackers, the only benefit from using native code directly is it takes a little longer to decipher what the application does.

So then one turns to solutions like what you are describing.


> they think that recoding parts of the application with the NDK will save them.

Yeah like that one app I reversed a while ago that generated the API key in a native library. I was able to get the key by building my own app around their library and calling the function that returns the key. Didn't even have to disassemble the thing.


Snapchat acquired Strong Codes in 2017. Before the acquisition they used Strong Codes compiler for obfuscation.

https://www.bloomberg.com/news/articles/2017-07-21/snap-hire...


This is brilliant work, I'm hoping in part II we get to see it working against the API.

I reverse engineered this in a production environment. It took approximately 7 months to build a scalable solution.

The investigation on how to create the x-snapchat-client-auth token is brilliant. One day I hope to do a talk on what my old team did to circumvent it.

There's a painful gotcha on the homestretch for this token: You may be creating the token, but it's not obvious what you're supposed to be using the method to sign.

What do they use it for? As far as I could tell, it's so they can verify requests at the edge nodes of their network. When you provide a bad x-snapchat-client-auth, you get a near-instant 403.


I think edge node is just checking if x-snapchat-client-auth valid, without checking if x-snapchat-client-auth is valid for this request. The second check is probably done at deeper level.


I'd be fascinated to read about your old team's work!


I remember back in 2013(?) I went to a collegiate hackathon in Santa Monica. Evan Spiegel showed up to walk the floor and someone showed him how they had sniffed the API and did something interesting with it (forget the particulars now, getting old). If I recall correctly, Evan offered the kid a job on the spot but the kid turned him down.

They've come a long way since then!


Was this perhaps LA Hacks [0] in 2014, or Hacktech [1]? Evan Spiegel attended LA Hacks, but I had someone who was attending Hacktech email me for help with the Snapchat API for their project. (I was part of Gibson Security, and published some early Snapchat API research [2] online in 2013)

[0] https://en.wikipedia.org/wiki/LA_Hacks

[1] https://medium.com/hacktech-2014/everyones-watching-hacktech...

[2] https://gibsonsec.org/snapchat/fulldisclosure/


Almost certainly HackTECH; it was held in a mall in Santa Monica right by the beach. I’m almost sure Evan came but it wasn’t to give a formal talk, but rather take a pretty low-key stroll-through. Maybe my mind is playing tricks on me. I did attend LA Hacks as well but I think it was in 2015, it was in Pauley Pavilion for the first time.

Your research looks fascinating and sounds similar to what I remember of the hack. Might be the same person we’re talking about. Small world!


I did some more searching, and I think it was Hacktech then. According to [0], Evan dropped by because of the project by Ash Bhat and Ankit Ranjan, apparently some of the organisers called him since he lived nearby. Seems you were right.

[0] http://appstorechronicle.com/2014/01/exclusive-snapchat-hack...


Hey Spiegel! If you see this I’m available for hire.


Snapchat is notoriously difficult to automate/spam.

The goal is to get the X-Snapchat token. The most elegant solution is to find the secret in the binary and reverse the algorithm to generate tokens. Wouldn't it be easier to MITM the endpoint; set up a dummy server (which collects tokens) in front of a proxy that spoofs the DNS and TLS certs (may be easier on rooted Android than iOS).

In my last attempt I gave up and went for dumb UI automation, but it would be cool (and worth good money) to exploit the private API.


Certificate pinning spoils that, no spoofing of certs with pinning.

Cert (or hash of) delivered with app. If server cert doesn't match expected value coded into app, someone is messing with something, terminate connection.


Yes that complicates things. But if you can find the cert in the binary's data section, maybe you can patch it with your own.


This probably runs into the same issue mentioned in the article, where the checksums are wrong and you end up in an infinite loop.


Assuming this is for Android, the APK would no longer be signed and would cause all login attempts to fail.

Have a read about "SafetyNet Attestion API" for Android.


You could patch Android and run it in an emulator. Or patch Snap not to care. Not super familiar, but there should be a way. Client side security can only do so much.


You can't patch Snap to not care because the safetynet process is (roughly) like that: The App asks the Play libraries whether the phone is okay. This is verified (in part) on the Google servers, so the Snap servers can ask Google whether a call came from a non-tampered phone. The client can't do anything about it, except tricking google into believing the phone is not tampered with. Which is notoriously hard, because nobody knows how the process really works.


In my experience, SafeteyNet bypass on rooted devices has been a solved problem for a long time through Magisk Hide.


Except not all Android devices have Play Services, for example is Snap available in China?


Just don’t do something like only trust a specific CA cert. really-really pin to a leaf.


My guess is the X-Snapchat is a one-time use token that changes on a per-call basis and may even been hashed to the actual data being sent in the API call. For example, if Snapchat is sending a pic that has a MD5 hash of X, the token somehow encodes that or other information so you cannot reuse that token.

I’m confident the security engineering team at Snap has all kinds of white hat teams to prove and probe the security constantly.


According to a old AppSec talk, they used a third-party security company to implement this stuff. They are a customer to a company called ‘Arxan Technologies’ that implements these ‘guards’ in their software. They’re very good at not revealing this, but it came up whilst looking at their private API.

These secret keys are there but heavily obfuscated and is nothing more than white-box cryptography which can be bypassed via emulation.


Worked with Arxan before. They are legit - what is described here is the tip of the iceberg. Haven’t even gotten into in-memory instruction and data encryption. If you’re dumping the binary you’re likely not even seeing all of what is executing at runtime


Well at this point, you might as well run the binary in a Mach-O ARM emulator since Snap has seriously cranked up the reversing difficulty to level 10,000.

I suggest anyone looking at this would need to use Corellium such that Snap has made it hard for almost anyone to get their private API.


Your only hope for emulating the whole thing would be Corellium, really. Too many real-device-dependencies.


This was a few years back but I had token generation working with something much simpler than Corellium using https://github.com/unicorn-engine/unicorn emulator [You will need to set up CommPage, handle system and mach traps, load dyld, etc]. They've probably added more security since then but back when I looked at it some of the data that was encrypted in the token off the top of my head was:

- Request Path - Timestamp - Snapchat Binary Size - Bit Flags for various hack checks such as jailbreak, checks for various tweaks, etc. - Device Type - iOS Version - A pair of counters, I believe these were being used to detect real devices being used as signature proxies. - A unique device ID generated at startup

I can't remember which one of the tokens this was for. There is a X-Snapchat-Client-Token used at login if I remember correctly and X-Snapchat-Client-Auth-Token which is used for every request.

I never ended up using it for anything but it was a lot of fun getting token generation working through emulation. I'm not sure if I was actually able to bypass all their checks or if it would have been detected had I actually tried to deploy it for something in production.


If you dig very deep, you can also find an offer to come work at Snapchat. Most will never find it.


Those who do have the skill to find it probably have better places to work for than a barely profitable company whose only revenue stream is to push trashy clickbait.


Barely profitable companies regularly pay big salaries for the right people, it’s why they’re barely profitable.


Hmm, I wonder how deep one should look. Why don't you shoot me an email at hot3eed at gmail? I'd appreciate it a lot!


I'm curious, can anyone recommend any techniques (or companies providing solutions) for attempting something similar with javascript in a browser calling an API? Obviously it's much more difficult to obfuscate an algorithm for generating a client token in JS than it would be in assembly, but I'm just curious if anyone has tried any form of "lock down my API so it's only callable from the web front end I provide" obfuscation.


>"lock down my API so it's only callable from the web front end I provide"

Generate secret tokens that the server can validate using some heavily obfuscated process. Compile the JS to WASM.

If you can use HTTP3 or WebSockets that's a bonus, because you can create a custom protocol that does some secret handshake before sending the goods.


Recaptcha tries to do this.

Their approach is to make a blob of code which collects all kinds of details about its environment (for example, Object.keys(window) ). It then uses a hash/concat of those details (with some random too) to decode data to decide what else to collect, hashes or concatenates those in too. Repeat a few times. Then send the final data blob back to the server.

The server can then run a tiny emulator to run the code with the same seed random to check the results are the same on a whitelist of allowed environments.


You can study the Instagram or TikTok web versions for inspiration.

Both use some wacky methods for request signing that include encrypted code, obfuscated control flow, hashing the browser environment, ...

Assembly obviously allows for much more powerful obfuscation than Javascript. Webassembly is somewhere inbetween, but a viable path since it is pretty universally supported by now.

Networks requests can be inspected trivially in the browser though, which makes things a lot easier.


Other methods out there to hide network requests in-browser.


Sure, you could do all kinds of things. Using GET requests with a encrypted payload in the URL, websockets with some wacky custom protocol on top, WebRTC shenanigans, ...

I haven't seen anything like that in the wild, though.


WASM would be a good option I believe


> To make your life even more miserable, Snap ocassionally deprives you of recognizing some basic standard lib functions ... You won’t be very happy after spending a day or two reversing a function to find it’s memmove in the end.

That sounds particularly devious.


Philosophically I never gave much thought to securing app client code.

Why not just track usage stats and ban clearly fake/high throughput users?


I was thinking the same thing, but I believe you made the same mistake I made: I wondered why Snapshot would care about people SENDING stuff via their API.

The issue is pulling images and chats out and potentially saving them, without notification to the sender. If the API was public Snapchat could no longer promise that images are temporary, because an unofficial client could store the images.


Because Snapchat is ultimately an application designed to trade in porn of amateurs including (and perhaps especially) teenagers.

They have a vested interest in playing dumb to that fact. They can't really do so if the content escapes out into the wild and shows up in congressional hearings, lawsuits, FBI investigations, DOJ reports, etc.


I think you wildly misunderstand how many people (teenagers included) who use Snapchat for PG-rated things exclusively. The end-to-end encryption (of snaps) and “disappearing” nature makes it work well for anything sensitive, but porn is certainly not the only thing people use it for.

Also, any party to a conversation can use the report button to send the unencrypted message to Snap for review. They employ actual content moderators as well, who have made reports to federal law enforcement before.


Sounds like a bit of lipstick on a billion dollar pig. I'm sure we all remember their early days, their marketing material was straight out of any drunken frat boy's phone.

I mean, pornstars say 90% of pornstars are selling content on Snapchat.

https://www.wired.co.uk/article/premium-snapchat-adult-model...


I think that's true for the beginning, and I'm not disputing your statistic, but as a percentage of the total userbase today, people using it for non-porn-related purposes are the vast majority. Even if nearly 90% of porn stars use it, nowhere near 90% of Snapchat's users produce or consume adult content on the platform.

Do you use Snapchat? Do you have friends who do? It's the de facto communication standard for teenagers because of Quick Add and the gamified nature, not porn.


No, I have no purpose for it, but I'm in my 40s. Twitter and Facebook are the only social media platforms I use.

To say snapchat has no basis in trading porn is to say that pornhub could relaunch itself tomorrow and say "oh sorry we're just a youtube knockoff now, we're not in the adult industry."

Well, it would still say pornhub in the URL, wouldn't it. And it would still be a site whose entire userbase was built on trading porn. That's what Snapchat built and used to grow its userbase, so trying to re-image themselves after getting the money is dubious at best.


That was Snapchat maybe for like the first year after its launch. It's just a normal semi-ephemeral chat app now where you keep steaks going with your friends and screenshotting is similar to liking.


Orrrrrr you just got older?


Because not so clearly fake users still make it through. The server side is also fairly quick on the trigger if you accidentally send anything that doesn't make sense, you're kicked off to "re-verification land"


Because the users who were less clearly fake would still degrade the experience of the rest of the users. To use an analogy, consider currency counterfeiting. The government doesn't just look to see who is spending lots of cash without a job because it's a much harder problem than making the bills extremely difficult for the layman to forge. Same principle here - making the token extremely difficult to forge is the easier route. You don't catch 100% of the bad actors in either scenario, but why not use all of the tools available in your toolbox?


I'm not sure this analogy tracks very well. The government doesn't bother heightening the quality of one dollar bills, either. The government doesn't have information about every transaction, a web API does.


These techniques are largely automated through metadata during a build process. It takes effort to setup, but not nearly as much effort as you think. The effect is asymmetrical- what takes you 1 hour to do costs a reverser 100. At least as time efficient as implementing backend detection algorithms.


If an attacker can mess around with their private API, that attacker can find a vulnerability. If an attacker finds a vulnerability, the attacker can steal user data. Consider the average age of the users and then consider what could happen to the company if they have a data breach. Who knows, it may have already happened, and that's why they are so serious about it now.


This is some pretty heavy-duty obfuscation. What is the business case for this amount of work towards preventing reverse-engineering? Decent rate limiting should be much more effective than making such a herculean effort to obfuscate one's API.

Edit: another comment mentions that snap chat uses an existing solution, which makes more sense than the expense of developing this sort of obfuscation in-house: https://news.ycombinator.com/item?id=23558784


Specifically, Snapchat wants to stop "tweaks" and alternate frontends that allow for bypassing of Snapchat's self destruction controls.

This is especially important given that Snapchat is widely used to trade amateur underaged pornography.


Seems like hooking the UI layer and intercepting data on the wire would be a much simpler approach. I wouldn't even try to circumvent the UI flow or animations. The more 'user-like' the activity, the more difficult it is to distinguish automation from human traffic. This doesn't scale as well well as many would like, but it can work. You could probably bundle something like this up and resell it as a grey-market API.

There may be some money in standing up a datacenter that is filled almost exclusively with smartphones.


How many of these tricks are off the shelf techniques? Seems like a tremendous effort.


OP here. About half are off the shelf. Joint functions, the breakpoint infinite loop, in-house memmove, the overflowing thing, those I haven’t read about anywhere before.


For the overflow, Jagex with RuneScape did it in Java. They also did stupid Object arrays 7 or so levels deep, doing casts on casts in between. The bytecode itself made the actual runtime slow to a crawl (anywhere from 5 to 10x slowdown.) This was circa 2014.


That’s interesting to know


> the breakpoint infinite loop

This is a fairly standard debugging technique.

> in-house memmove

You sure they didn't just statically link a libc?


> the breakpoint infinite loop

Have seen multiple times in CTFs


Usually the good CTFs don't stoop to stupid tricks like these, to be fair.


There are numerous commercial compilers (for C and C++) that specialize in obfuscation. I suspect they are using one because to do that level of obfuscation manually would make the source code unreadable.


Yup, they actually acquired anti-reverse engineering startup Strong.codes after using their software for years


OP, your comments are showing up dead so you might be shadow-banned.


Thanks for letting me know! I contacted support.


Your account's fine, but some comments were getting caught in a software filter. Sorry! I've marked it legit so this won't happen again.

(Fortunately users vouched for all the affected comments, so they were unkilled before mods got to this thread. That's exactly what the vouch feature is for and I love to see it work so well.)


Hi dang, why is my latest submission flagged?


You should send that question to hn@ycombinator.com


Do tiktok next, their obfuscation techniques are quite interesting. :)


Why not ;)


Best value of this kind of obfuscation is they usually rely on a random seed, and every time you obfuscate you have different results. So once you update the app (and change hash function), for new version, spammer need to the all reversing once again.


Not all of it, really ;)


One thing I'm curious about is what they do to try to stop you from just ripping out the obfuscated token generation library and setting up a harness to run the whole thing in https://www.unicorn-engine.org/ or something. Like presumably they don't compile their whole app with obfuscation and it's just some library that's linked in with some kind of stable-ish API contract with the rest of the app. I wouldn't be surprised if they do interesting things to try and stop you from ripping it out and it'd be cool to learn what those are.


You could manage to isolate these functions. The problem is that it's much of a hassle to run the whole thing on an emulator because there are way too many real environment dependencies, and even if you go the hackery way and patch all those, you won't know if you're generating one with the correct parameters because you're treating the whole thing as a black box.


Why wouldn’t they obfuscate the whole app?


That'd be a noticeable performance hit I'd say.


I think someone used this route but ran the real binary on real phones that had been injected with his code that allowed tokens to be generated.


All these to make Snapchat not being recorded. Well, it's a mouse and cat game and currently the cat is winning, as in using Memu on my PC allows me to record everything happening there, your crush nudes and dances included.


This sort of thing has been prevalent in the game world for decades.

I once had the chance to work on a project disassembling casino machines, and they had similar protection appropriate for the technology of the time


There are already alternative front-ends for YouTube, Facebook, and Reddit. I’d love to see one for Snapchat and Instagram, although it looks like one for Snapchat would be incredibly difficult.


Most usable alternative front-ends for such services are usually just parse web API or even plain html in case of YouTube. Snapchat don't have any web app so parsing it is tricky and Instagram provide only limited set of features on the web.


This is what the top 1% of MIT grads work on. Obfuscating IP for a user data company.

It’s clever, but man... I have to believe these talented folks were destined for something greater.


How would one go about understanding the content of this write-up? Even after the first paragraph it begins to go completely over my head.


You need some assembly background, then OWASP's guide[1], has all basics.

[1]: https://github.com/OWASP/owasp-mstg


Thank you very much.


I would love to be able to make a bot for the snapchat group my friends and I have. We already have a blast using it now. A bot that could randomly do things that we could all interact with would be hilarious. Sadly I don't think this functionality will be introduced. So it will be cool to maybe slap something together before all of this gets fixed.


Interesting read! I'd love to read the next post, but at least Miniflux can't find any feed.

3eed, would you be open to adding an RSS feed?


Sure! I'll consider doing this before the next post


Thanks!


Yes please!


How does one go about learning reverse engineering? Is it mostly by practicing? Are there any good up-to-date resources?

I remember taking a reverse engineering course in the university where the professor didn't even bother to explain the basics, it was like black magic and left me frustrated, but I still feel amazed when I read blog posts like these.



I am interested in learning RE also. After some search on the internet I found that most people recommend Practical Malware Analysis book. I started reading it, it's seems pretty interesting. I didn't get to the RE part yet but from looking at it seems to be pretty good for beginner.


I was wondering if there are any steps a developer of a small app can take to add such a header and lock down the API so it only answers to said header. This level of obfuscation doesn’t seem doable for smaller shops. Is there something simpler, that is “good enough”?


Security is a continuum, how much resource you can put into fending off prying eyes depends on how valuable your assets are and so how many prying eyes are targeting you. But as a start OLLVM is open source and not bad at all.


JWT perhaps?


I have been advised by researchers in the field that it takes about a day with an optimizing compiler to de-obfuscate most any piece of commercial software of this size, with a good team. With a less than great team, perhaps about a week. Is that true?


Definitely not true.



Wow, that seems really messy. If you're just after the API key or whatever, wouldn't reversing the Android app be simpler? As far as I know, you can't do all these low-level tricks on the Java platform.


Android was way easier until Google started helping out and introduced SafetyNet attestation. I think it was orginally coded for Google Wallet/Pay but Snapchat were definitely using it from early on. Pokemon Go use it as well.

Apple have no such system as far as I can know, and if they do, don't share it with 3rd party developers.


Not only does Android use obfuscators as part of the build process, most applications make use of native code for their secure modules.


Java obfuscators exist too. I don't know how they compare in terms of complexity to reverse though.


I think it's mostly native code on Android too except for the UI.


> In Mach-O binaries, functions whose pointers are in the __mod_init_funcs run before main.

Remember that obfuscation makes your code run slower. This specific one is part of the reason why the dyld team probably hates you.


The performance impact is not as much as you’d think, speaking from C/C++ land. I had a secured video player that was using these techniques, and even with the dial turned all the way up, it was costing 1-2% CPU and no human-detectable latency


to answer everyone asking 'why do they do it????' its because of spam, that simple. they dont want:

a) outbound bots that send messages to users created in bulk messaging millions of users. b) inbound chatbots that answer messages c) when they had snapcash, they didnt want bots generated collecting cash.

spam is a multi million dollar industry.

@3eed i guess it's not considered obfuscation but you gotta pass the correct version # or you won't be able to connect either, old versions are immediately obsolete.


oh whoops forgot they also dont want scripts following users then logging all their private pics/posts without flagging it as beeing screenshotted which defeats the purpose of the app.


Are any of these obfuscation techniques possible on the web? My guess is no, but just curious.


Things were definitely much simpler a couple years ago.


I'm actually a bit surprised it's taken as long as it has for mobile obfuscation to catch up. Both on the Android bytecode and the iOS native code side there are obvious PC analogs in the form of .NET obfuscators for MSIL and packers and game DRM for native code. Of course, the Obj-C runtime throws a little wrench into things on iOS, making it a bit of a hybrid - but the approaches are still similar.

It's still an ultimately pointless cat and mouse game as long as an attacker can trace the hardware (and with full-platform emulation tools like Corellium, this capability is unlikely to go away soon), but still an amusing one to watch.


I've seen this for _years_ already - it might also be because of a bit of an app usage difference, where the people that typically do these writeups or browse HN don't install the shitware on the app stores that do these things, except for rare common cases like large communications platforms (Snap)

Many Asian mobile games run malware DRM like this - https://www.wellbia.com/home/en/pages/xigncode3-for-android/

They are incredibly invasive and insecure, exfiltrate tons of PII (location, private IP, mac, ...) where possible unencrypted to bare IPs in various countries. This company in particular also has a PC-based anticheat rootkit that doesn't prevent cheating and allows the developer to "remote control the user", which is also an advertised feature.


Very nice article. Great piece of work.


Great write up! Thanks for posting!


Will Apple approve an app with this level of Obfuscation in it's source? I thought they had to have the source itself?


No, they don't. You can provide them with symbol files for your application so they can symbolicate crashes on your behalf, but this isn't required. (Interestingly, there are teams at Apple that reverse engineer applications for compatibility reasons, and the occasional "someone got an obfuscated binary past app review and we need to know what it does".)


Huh, thanks. I thought they reviewed the source of all apps to make sure they aren't doing anything naughty.


That would be ridiculously expensive.


Apple don't need the source of your app, but some bytecode that they can optimize for target platform. As for making sure that certain app not using private frameworks they can just do it through the testing.


Bytecode is not required for applications targeting iOS. And I will note that the latter is fairly difficult to actually check in principle, and it's mainly enforced (actually, in certain cases it's not ;) ) in practice by the threat of consequences if they catch you doing it rather than testing.


So providing bitcode is still optional for release on App Store?


As long as you are not targeting watchOS or tvOS.


Thank you for correcting me! I really expected that by now they using it for all devices.


Doesn't Snapchat mainly rely upon the iOS or Android platform having some software that prevents screen shots if a 'no screen shots' flag is set? I always thought this was their core defense.


Their "core defense" for an end user is to notify the sender of a message if their post was screenshotted by using native screenshot detection/notification libraries provided by the OS, combined with heuristics to try to prevent them from beeing hooked and bypassed.

However, the binary level protection is only tangentially anti-screenshot insofar as it also blocks third-party 'scraper' apps. Fundamentally, it's about requiring use of the Snapchat app to access the Snapchat APIs, and as we know from the ongoing saga with other social networks like Twitter, this is ultimately about business control over posts (preventing spam-bots, scheduled clients, and so on) and at the end of the day, ad revenue.


This is also trivially bypassed by the following technique:

1. Open the Snapchat app for a bit so it is downloaded / cached.

2. Turn on Airplane mode.

3. Screenshot and screen record to your desire.

4. Delete the whole app.

5. Turn off Airplane mode and download the app again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: