Introduction to Sentry Symbolicator

the_mitsuhiko · on April 5, 2023

Figured it might be fun to share this again. It's a standalone symbolication service for minidumps, native stack traces and more that we use at Sentry behind the scenes for a lot of crash reports. It has however quite bit of use outside of Sentry.

It's written in Rust, originally in actix, today in axum.

js2 · on April 5, 2023

If this has existed around 2014 it sure would have saved me a lot of time building a solution in Python that made use of breakpad symbol files under the hood to symbolicate PLCR reports. iOS as I'm sure you know turned out to be the easy part. Minidumps, especially from Android, are a PITA.

AFAIK there's still no good solution to getting a useful Android stack trace that starts in the ART and crashes inside native code other than the unwinding on the device itself. But even that is a crapshoot.

The thing that worked the best for us was pulling the stacks that Android dumps into logcat just before terminating the process.

I really don't understand why Apple and Google both make it so hard to get a useful stack trace. Especially when the OS itself is typically already capturing it out of process.

the_mitsuhiko · on April 5, 2023

Android is just pure frustration, particularly when working with memory dumps as there is basically now way to get reliable system symbols. This is already an issue on iOS (and modern macOS) since Apple refuses to provide symbol servers so we need to manually harvest as many symbols as we can from the dylib caches.

Generally it seems like most companies have no interest in making it easy/possible to make stack traces in production. Google clearly is running into that themselves, and they are having some support in Android for it, but they don't like to share that. Even today we ship multiple different unwinders on Android to cover all bases.

js2 · on April 5, 2023

Harvesting iOS symbols is pretty easy though compared to Android. At one point I tried crowd-sourcing Android symbols from employee phones but that turned out not to be helpful.

For iOS (I know I'm telling you stuff you already know) you used to have to plug an Apple device into a Mac running Xcode, let Xcode copy over the symbols, then you could upload those to your internal symbol service. But at some point Apple at least stopped encrypting the firmware so you could skip the Xcode step and extract the symbols directly from the firmware. So various folks have just been caching copies of them. e.g.

https://github.com/CXTretar/iOS-System-Symbols-Supplement

the_mitsuhiko · on April 5, 2023

It mostly works, but then there are some OTA only releases which you need to get from there. Sadly the actual support for reading and unpacking the various OTA releases has been pretty brittle. In the past we used a tool that loves to segfault, now we're trying this one and see if we have some better luck there: https://github.com/blacktop/ipsw

It's a bit of a cat and mouse game still.

saagarjha · on April 6, 2023

Fun fact: if you upload crash reports to Apple they will symbolicate it against their private symbols, which are richer than anything that ships on-device.

the_mitsuhiko · on April 6, 2023

That answers a few things :-/

javierhonduco · on April 5, 2023

Thanks for sharing! I work in this space too (profilers, debuggers, etc) and have seen a good amount of interesting edge cases.

Are there any interesting bugs and edge cases that you could share? Always good to hear good war stories from production :)

the_mitsuhiko · on April 5, 2023

> Are there any interesting bugs and edge cases that you could share?

Too many to count, particularly the PDB format has so many absurdities in practice that I'm quite impressed they are so well supported even outside of Microsoft's ecosystem. For me one of the most fun aspects is that Microsoft's kernel tools instead of fixing up PDBs when performing certain optimizations on the object files, chose to instead attach additional maps in the PDB (OMAP) to remap everything. That decision makes handling of PDBs much more tricky than they otherwise would have to be.

In terms of actual bugs, there are too many to count. One of the more fun one in recent history is that Apple's tools are running up against limitations of some remaining 32bit offsets even in 64bit macho formats and now at times just overflows and you're stuck.

KRAKRISMOTT · on April 5, 2023

Why did you migrate from actix to Axum?

the_mitsuhiko · on April 6, 2023

The reasons are not particularly important. We were on an old version of actix-web and no longer used the actors. Folks wanted to evaluate different framework options and someone picked axum. I think it comes down to basically the same. Tower adds some crazy type magic that can result in awful compiler errors which isn’t fun.

josegonzalez · on April 5, 2023

Maybe I'm missing something, but how is one supposed to use this? I see a command to run a server but is there a maybe a short tutorial on how its supposed to be used by engineers in their day-to-day?

the_mitsuhiko · on April 5, 2023

It's useful if you have minidumps or structured stacktraces and you want to symbolicate them at scale. Once it runs, you can hit it with API requests to get out what you want: https://getsentry.github.io/symbolicator/api/

> how its supposed to be used by engineers in their day-to-day?

Most people would not use tools like this day-to-day unless they build tools themselves that need to process this at scale. If you are building a profiler, debugger or else for instance you might find this useful.

LegNeato · on April 7, 2023

Bit of mobile history, Facebook was the first to get iOS symbolication working on Linux and open sourced it: https://github.com/facebookarchive/atosl. There were rumors that another company had done it but they wouldn't release it as they saw it as a competitive advantage (others needed mac minis, they could use cheap linux boxes).