Hacker News new | past | comments | ask | show | jobs | submit login
Reverse Engineering the Facebook Messenger API (intuitiveexplanations.com)
161 points by luu on April 7, 2023 | hide | past | favorite | 38 comments



It would've probably been much less ugly if the author reverse engineered one of the mobile apps instead of the web version. Those must use a much saner API that actually feels like an API.

I myself reverse engineered and patched Instagram for Android once because I got fed up with ads in my feed, but there's a high probability that all Facebook apps use the same "proxygen" HTTP client library. This library is notable because it is written in C++, thus requiring JNI bindings on Android, and that part can't be obfuscated — so, easy to find and fully stable across versions.

So I just added some calls into my own code (java source -> javac -> dx -> baksmali -> copied to "classes" in apktool project directory) in the bytecode of those JNI bindings and started logging requests to see how the ads are returned. I then built a response rewriter to remove the ads. If you have a spare Android device, you could even avoid messing with the app entirely by injecting your own code into the unmodified app using Xposed Framework. Xposed is very underrated as a reverse engineering tool.


> Xposed is very underrated as a reverse engineering tool.

I fully agree, but on the off chance you don't know about it already, I highly suggest you look into Frida as well. It can do all the same things Xposed can (hook function, see or change output), but is a lot faster to iterate on since you don't need to compile anything and can even do some things in a REPL.

That, coupled with the new jadx-gui's "copy as frida/xposed" snippet and the usual suspects for network monitoring (mitmproxy/charles), is a crazy powerful Android reverse engineering workflow.


I heard about Frida but never tried it myself yet.

The thing with Xposed that makes iteration slow isn't even the fact that you need to compile your module. That part is fast. You also need to reboot the device to apply your changes, that's the slowest part.


I know Kotlin and Java. Can you point me to any resources on this?


Not sure. I started reverse engineering Java apps very early in my life — initially it was J2ME games. Decompilers of the time sucked but that didn't stop me from modding Gravity Defied :P

I honestly don't know what's a good way of getting started on reverse engineering. There's a bunch of everything about Windows executables in particular, including "crackmes", but native machine code is a level up from JVM bytecode. Java classes and Android dex files can be decompiled back to sensible source with a good chance that you get something that can be compiled again. No such luck for native code — C/C++ compilation is a lossy process by its nature, especially the optimizations. Ghidra does a decent job but still requires a non-zero amount of manual assistance. Flash games also were good to hone one's reverse engineering skills since ActionScript decompilers did a pretty darn good job.

Anyway. To decompile dex to Java source, there's jadx: https://github.com/skylot/jadx

Since decompilation is sometimes lossy, there's apktool for when you want to put the app back together after tinkering with it: https://github.com/iBotPeaches/Apktool

It goes without saying that you also need a JDK and the Android SDK. In particular, you need apksigner form the SDK to sign the unsigned apks generated by apktool. You can also automate things a bit and use adb to deploy them to your device.

What I usually do is get a high-level overview of the app in jadx, and then modify the smali (dalvik bytecode in text form, very assembly-like) files generated by apktool.


Sounds like app "Instander"


Ouch. The API designer inside me is shivering at that monstrosity.

> You might ask why the Messenger API expects a JSON string inside a JSON string inside a JSON string inside an HTML form. You would have a very good question.

Probably multiple layers of tooling and encapsulation to get to the real backend, or just unpaid tech debt.

Also, first the author discovers a GraphQL endpoint. I thought great, just introspect the schema instead of blindly poking around!

But no, the GraphQL API seems to only be used to transfer project LightSpeed payloads which are server generated JS snippets meant to be blindly executed to update the UI. Rough.. I know GraphQL originates at Facebook but I don't think that was the intended usage...

Also, the author's python script seems to be opening a new session on every invocation. This might have triggered the account suspension, since login endpoints are certainly amongst the most heavily monitored for suspicious behavior.


My experience at FAANG was that every major service had like 9 layers of abstraction or had to traverse several hops to do the actual thing. I've kinda become jaded and feel like most things that make money are ugly. Whether it's a giant plate of spaghetti served up by some start-up trying to be the first to market, or a lovecraftian horror that is only exceeded in scale by it's unknowability - like what OP posted. The end result is the same... 40 api calls to submit a form :'''(


This amazes me with just how bloated and inefficient simple things have become.

In ~4 decades we went from literal 'netcat chat' where your message may be shorter than TCP/IP headers most of the time, to slightly more featureful but still simple with IRC, and then the IM protocols of the 90s/2000s with even more features but usually quite tame (speaking from experience of working with MSNP). After that it started getting worse with XMLification, and now the web-based stuff.

But those last two requests aren’t API requests, they’re just requests to fetch images!

...that naturally raises the question of "what images? I didn't send any."

It's horrifying to see a <40B message get turned into 730B. I suspect that the majority of the time, the actual message content you send will be dwarfed by overhead.


Messenger conversations are persistent, so a more apples to apples comparison would be against emails.


I was building a Google Analytics looking admin interface for a company and I kept glancing over at GA requests/responses for guidance. It would send a ton of redundant data over for each 'next' table rows like the column headings and some styling stuff. It wasn't ugly but it was way heavier than needed. So I did similar and it worked great.

Update: Looking at the mostly unusable GA4, their 'next' table row responses are much more concise. The only problem now is that entire application is a pile.


40B message transmuted into 730B of meta-data gold.

Pretty good ROI! If it's free you're the product (et cetera)


I miss the good old days when FB Messenger was usable with ordinary XMPP clients.

Platforms should be forced by law to open themselves to third party clients ffs. Yes, spam will be a problem - but seriously: FB is big enough to afford investigation teams that cooperate with law enforcement to tackle the spam at the source.


That's almost what the EU's Digital Markets Act will require them to do: https://www.theverge.com/2022/3/24/22995431/european-union-d...


The FB Messenger XMPP client gateway was never very good or complete. Certainly it was functional enough to use for basic chats, but no heavy user was ever going to be happy with it unless Facebook put a lot more work into it... So they opted to turn it off instead and make us write our own.


So the government will get to decide whose business is defined as a 'platform' before they enforce this requirement of maintaining an API suitable for 3rd party clients? Do they also decide what features are required to be in that API? I guess on a case by case basis?

Software in the pharma industry works kind of like this. That's why it can take months to years to make a tiny one line change to an application once it's been validated by the FDA.


At some point a public company becomes an utility. I'd argue that Meta and Google are at this point, albeit for different reasons.

The specifics are obviously difficult to reason about, but the scale of network effects and/or amount of internet communities which only exist as facebook groups or whatsapp groups and would simply be gone if either stopped existing is... troubling.


At the point that a public company becomes a utility, why not nationalize it? The alternative being presented here is that private companies should be legislated and regulated into behaving in a particular way which is an incredibly inefficient way to manage a public good.


Honestly in practice I'm not sure there's much difference. A "nationalized" utility would still be regulated by legislation to behave "correctly" when operated by the executive branch. See e.g. USPS.


I think the difference between the USPS--which has laws preventing it from making a profit--and a US-based tech firm that profits by selling user data to advertisers would be a massive one.


Matrix has a bridge that interfaces (very well might I add) with messenger. I haven't looked at the code but some control messages suggest the bridge uses a MQTT listener to connect to messenger.


Fun fact: there is a 'wishlist`-priority task sitting in FB's Tasks system with detailed plans for how to change the MQTT API to break third-party clients should any of them become too popular.


Did you learn about this first hand or is there any source you can cite?


There's nothing I can cite. Saw it myself many years ago. I doubt many of the people who worked on that iteration of Messenger are even there any more, but it shows the attitude of the organization as a whole.


> FB is big enough to afford investigation teams that cooperate with law enforcement to tackle the spam at the source.

Same can be said about Google, Apple, Microsoft, etc

Yet spam still exists, how is that possible if your assumption was true?


Lack of incentives due to costs being externalized.

It's not that they can't, it's that they don't want to enough.


Their profit motives are entirely aligned with reducing spam.

I don’t understand why people always think hard problems aren’t solved because of some deep conspiracy. Sometimes they’re just hard.


> I don’t understand why people always think hard problems aren’t solved because of some deep conspiracy.

I didn't allege any conspiracy. I alleged that it isn't done because it isn't profitable enough.

Moderating is expensive as fuck, so platforms do the bare minimum required by law. Facebook makes 23 billion dollars net profit a year, Microsoft 72 billion dollars, Google/Alphabet around 60 billion dollars.

Even investing just 20% of these insane amounts of money into a joint effort to combat scams and cybercrime at the source (both by assisting law enforcement and by lobbying for laws) would put a serious dent into cybercrime. Imagine what someone like Pierogi of Scammer Payback, Trilogy Media and the other scambaiters could do if they had an actual budget, and now imagine what actual law enforcement could achieve if they had the resources to actually Follow The Money.


Mind elaborating on why you think that should be a law?


Because platforms intentionally go away from their open beginnings once they gain traction and moat. Facebook's removal of xmpp was the first, then Twitter came along with restricting features to only the official client (polls and medis in DMs) years ago, and now it's cat and mouse, with Twitter kicking off apps with zero notice.

Even Reddit, which has historically been extremely open towards third party apps, now gates features - RIF for example can't integrate with rewards, which is IMO pretty moronic as it could be a pretty good revenue stream for Reddit.

Capitalism has failed, and it's high time governments step in to break open the walls of the gardens.


I’m not saying that walled gardens aren’t bad, but they’re the property of those organisations. Even if you wanna start bashing holes into problematic gardens then they’d be the bottom of the list, cloud providers being more open (or at least consistent) would have a massive reduction in human time / cost.


property is a man made idea. No reason to treat is as something sacred. And we already don't care about property when government bails out these crooks.


On that basis, hand over the keys to your house and car and kindly vacate promptly.


    "Sending automated (or non-automated) spam to other users
     Downloading people's data without their consent
     Putting undue load on infrastructure you are not paying for"
This applies in both directions.

     Sending automated (or non-automated) unsolicited ads/content to users
     Harvesting users' data without their consent
     Putting undue load on computers/devices/network connections you are not paying for


> Remember: Reverse engineering is ethical, pro-democratic, and protected under US law, but you still need to exercise integrity and responsibility when interacting with any online system.

US law is rarely simple enough to summarize in a sentence. This is no exception.

hiQ Labs, Inc. v. LinkedIn Corp might provide a useful sampling of things to consider.

> This case has been a litigation odyssey of sorts, to the Supreme Court and back: it started with the original district court injunction in 2017, Ninth Circuit affirmance in 2019, Supreme Court vacating of the order in 2021, Ninth Circuit issuing a new order in April 2022 affirming the original injunction, and back again where we started, the lower court in August 2022 issuing an order dissolving the preliminary injunction, and the most recent mixed ruling on November 4th, 2022.

> It certainly has been one of the most heavily-litigated scraping cases in recent memory and has been closely followed on our blog. Practically speaking, though, the dispute had essentially reached its logical end with the last court ruling in November – hiQ had prevailed on the Computer Fraud and Abuse Act (CFAA) “unauthorized access” issue related to public website data but was facing a ruling that it had breached LinkedIn’s User Agreement due to its scraping and creation of fake accounts (subject to its equitable defenses).

https://www.natlawreview.com/article/hiq-and-linkedin-reach-...


I love how sometimes the email address is protected and other times it's not.


(2021)


I wonder if you try to login in another device, doesn't it ask you to approve the login?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: