Hacker News new | past | comments | ask | show | jobs | submit login
Adventures in WhatsApp DB – extracting messages from backups, with code examples (medium.com)
151 points by walterbell 14 days ago | hide | past | web | favorite | 33 comments

Oh, I wish this had been written a few months ago, when I had an old phone and a new phone each with messages the other didn't have.

I eventually managed to pull the sqlite databases while each was unlocked, merge them, and then put it back and convince WA to read it rather than overwrite from the network.

Several hours of my time, and almost certainly not worth it. Reminded me how bad and annoying platform lock-in is though.

I wish it had been written a couple years ago, when I lost all the messages I had in my Windows Phone. The official Whatsapp guide[1] reads to this day "at this time it is not possible to transfer your chat history to an Android or iPhone", which I can't interpret as a good faith error no matter how hard I try.

[1] https://faq.whatsapp.com/en/wp/28060005

WhatsApp still cannot transfer message backups between Android phones and iPhones. I switch between both with new devices every 1 or 2 years and it always a pain losing the messages.

Export via email is a possible solution, if you care for the messages, but not where they are displayed.

Why do you need to save messages 1 to 2 years old?

Why should I not have them in the event that I want them?

Do you delete all your emails and wipe your contacts when you replace a device too?

1) Your contacts are saved elsewhere.

2) Yes, I delete emails I don't need anymore. Almost all WhatsApp content is not important.

3) I upload pictures/videos to my Nextcloud NAS after which I delete them. Goes automatically.

But 1 and 2 are 'yes, I do want to keep things'. All I'm saying is WhatsApp (and similar) make that a pain, deliberately to try to lock you in, and it's annoying.

With regards to 1) they are simply saved elsewhere, in your Contacts. You can import/export that.

It is a sacrifice of user-unfriendly security. They could've easily made a backup function which does not (solely) involve Google Drive. They opted not to.

However there's an upside to this as well: if you delete that key, you're done for. If your discussion partner(s) do the same, the data between you and your discussion partners is also done for.

I agree with you, but I don't find it difficult to work around it, and the previous solutions (plaintext, unencrypted logs) were terrible.

It's not possible because they use different frameworks on each operating system. On iOS, they use Apple's Core Data, which is optimal for what they need, but is not portable as Core Data is closed source. That's not to say that there cannot be a method to do this, just saying the first major hurdle they'd hit.

This is written by someone who doesn't know what Core Data is, or how it persists its objects. It's funny to see how they try to guess what various columns are and how data is formatted. They could have just imported the database file into Core Data and extract the model from there, load the entities, run sophisticated queries in an acceptable syntax, etc.

Or maybe this was written by someone who was more interested in showing how to use sqlite and pandas directly to do this?

On the other hand I find it interesting that Apple decided it was important to create some shim on top of sqlite with their own conventions, but said shim is not so convoluted it can't be reverse-engineered. Can't really decide if it's good or bad.

Core Data is an object-graph persistence framework, not a database. As said in the other comment, Core Data is an implementation detail, and there are other backends to choose from.

Apple adds additional metadata on top of the entities themselves in order to provide smart versioning and migration utilities. The entire object model is persisted inside the database, so that lightweight and heavyweight migration patterns can be deduced the next time the database is read. This metadata can also assist with opening WhatsApp's database and parsing it using Core Data.

Core Data actually supports multiple backends and SQLite is only one of them (probably most popular though).

> [...] discover that WhatsApp doesn’t work on it due to a weird message > "Your phone date is inaccurate! Adjust your clock and try again." > I have no idea what do date issues have to do with not starting WhatsApp but fixing the date and time didn’t help [...]

Some network protocols use timestamps for security. For example, OAuth 1.0 HMAC used timestamps as part of its mechanism for preventing replay attacks.

I have no particular insight into what WhatsApp is doing, though.

As you say that should be the reason, but it seems that in most cases documented online fixing the date and time doesn't actually fix the error message, nor updating WhatsApp to latest version...

and one more thing - this type of error shouldn't prevent WhatsApp from starting in any good programming practice; just from communicating with the network...

As someone who keeps on switching from iOS to Android phones as my primary and daily driver i am still waiting for a way to effectively sync my Whatsapp messages over when i pick up a phone that doesn't run the same OS my previous phone was on.

Does iOS WhatsApp not support backing up to Google Drive like the Android version does?

iOS app backs up to iCloud and Android version of the app to Google Drive and they are locked down with no other choice.


"Reading around it seems that Apple, in their infinite unique wisdom have decided to use dates starting from 1.1.2001 on iPhone so let’s see what happens if we add an offset ...."

This did put a smile on my face. The author forgets/doesn't know WhatsApp is Facebook's app, has nothing to do with Apple and the fact that he used this from an iPhone is irrelevant. On Android local /data/data WhatsApp's database is the same.

2nd smile, upon re-reading. Quote:

"Let’s see what we’ve got. The below is my analysis based on the data I’ve found in the tables in my own DB and my inferences about it: 1. Z_PK — seems like a serial number..."

Yup, it is a serial number. One those who dream SQL, like me, for past 20+ years, will also call it Primary Key. This article is getting better with every re-read :)

you just put a smile on my face :) did you miss the last ~50 years of using unix timestamps on almost all platforms? for your reading enjoyment: https://developer.apple.com/documentation/foundation/nsdate https://developer.android.com/reference/java/sql/Timestamp.h...

On a wider scale, this is a step in the right direction for getting away from the lock-in of these kinds of platforms. Offering E2E but still in proprietary formats.

would this work for android?

At least in the past, the android whatsapp database was encrypted, and you needed either a rooted phone or some shenanigans requesting the key from the whatsapp service.

(the Google Drive backup of it is not encrypted, but ... it's also not accessible to you! Facebook and Google are dealing your data behind your back, but don't believe you have any rights to access this data except through the tiny whatsapp app window)

The local database inside /data/data/ is not encrypted. However, backups (both local backups and Drive backups) are encrypted with a key generated on WhatsApp's servers when you log in.

You can access the Google Drive backup with this third-party tool, which admittedly is less than ideal: https://github.com/YuriCosta/WhatsApp-GD-Extractor-Multithre...

I'll add this link on GitHub - thanks!

re: original question: AFAIK since Android 8 there is encryption of everything exported.

Nice work but its appalling that this had to be done to simply export some messages. Doesn't the GDPR require this for whatsapp already?

When I last looked into it they had an export feature in the app that basically compiled a bunch of text/HTML files on the device containing conversations and (optionally downscaled) image media. The export from the device itself was done through a mail client of your choosing.

There's no easy way to automate/schedule this, and you have to do it one conversation at a time. Long-running multiyear conversations produce unwieldy multi-MB files. They're not even doing the bare minimum to make data portable, as I see it.

I try to hold on to all digital communications with people I know (as opposed to service providers/professional interactions/strangers on the internet), and preserve it on my own hardware in portable formats. The idea is to have the digital equivelant of the box in the attic full of letters. Maybe that record will be interesting to me in a few decades, or to my descendants - or maybe not, which is fine too. But either way it would be a damn shame if we all collectively lost the "boxes of letters in the attic" to digitization, and these proprietary platforms aren't helping.

Don't think GDPR applies since the personal data is stored on your device

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact