
Building Sentry, a service to process native crash reports and minidumps - daniel_levine
https://blog.sentry.io/2019/06/13/building-a-sentry-symbolicator
======
etaioinshrdlu
Sentry is one of the nicest services I've head the pleasure of using. Having
our errors centrally logged and managed is invaluable.

Source: a happy user.

~~~
PikachuEXE
We use Sentry for Rails & JavaScript and we are happy with it Haven't tried
with CSP (Content Security Policy) yet but it seems great too (from reading
their blog post)

------
robocat
We have used Sentry for a long time with JavaScript. The main issues for us
are:

* Obese JavaScript code. We had to write our own custom code to log events.

* Aimed at large scale companies. We only have 1000s of users, and we care about each individual exception, but I think it is really aimed at consolidating large numbers of events.

* Meaningless percentages on data. Tagged data is processed, but the end percentage value has little meaning e.g. send through 1000 similar events, with 1 event with a tag with value X, and 1 event with a tag with value Y, and 998 with no value. Sentry reports 50% X and 50% Y!

But they have given us really excellent service, especially given we are not
paying enterprise rates.

Edit: also we are not in a US timezone, which makes the UI weird. And I do
love the email integration: have a bug, get an email, fix it.

~~~
the_mitsuhiko
> * Obese JavaScript code. We had to write our own custom code to log events.

For what it's worth we spent a lot of time to reduce bundle sizes recently.
It's ultimately a tradeoff between how rich and complete the data is one wants
to capture (and from how many browsers) and how big one wants to have the
bundle :(

~~~
StavrosK
Can you make a "lite" module for people who want basic exception handling,
perhaps?

~~~
the_mitsuhiko
In theory it should treeshake down quite well if you use esm modules. Depends
obviously on quite a few factors.

Additionally there is another version of the JS SKD dubbed "loader" which lazy
loads the real SDK on first use:
[https://docs.sentry.io/platforms/javascript/#lazy-loading-
se...](https://docs.sentry.io/platforms/javascript/#lazy-loading-sentry)

------
xvilka
They might want to check radare2[1] for processing crash dumps, since it
supports all 3 major platforms (Windows, Linux, OS X), and allows to play with
the stripped files as well.

[1] [https://github.com/radare/radare2](https://github.com/radare/radare2)

~~~
the_mitsuhiko
Armin from Sentry here: the feature set overlap between sentry's symbolicator
and radare2 is not that great. The goals are also very different. We only want
to unwind and symbolicate but we need to do this at scale. radare on the other
hand wants to disassemble and do all kinds of low level debugging and it's
build for user attended interactive sessions.

~~~
xvilka
Some companies use r2 for batch jobs, moreover it is available as a library.
Anyway, just wanted to inform you, if you didn't knew.

------
rtpg
Sentry has been very good to us, and it’s a generally great business model to
boot! Overall great for the community and for ourselves

I am going to whine a bit that the recent move over to the unified SDK has
been less than ideal for us. The fact that the raven docs would point us to
the unified SDK but not to a “how to migrate” page made me super unsure about
whether we were doing the right thing (esp. when it came to the logging
integrations on Python)

It’s kind of an interesting problem, providing SDKs for each language. Sentry
went with unifying the API across language boundaries and I’m not super happy
with the results but I don’t have like 30 packages to maintain

~~~
the_mitsuhiko
> Sentry went with unifying the API across language boundaries and I’m not
> super happy with the results but I don’t have like 30 packages to maintain

Yeah, that move and the docs did not go exactly as planned. There are a few
reasons why we did it: a) the old SDKs had no sensible state management which
caused endless issues such as incorrect breadcrumb collection in async code.
b) it's really hard for customer support to understand the number of SDKs.

We're working on improving that, in particular docs.

------
Operyl
The title cuts off “Symbolicator,” the specific name of the component here
which is slightly confusing.

~~~
daniel_levine
It was in the original title I submitted, but the admins must have changed it

------
sciurus
This is cool stuff! It's nice to see what Sentry can develop in this space
with the focus and resources that they have.

I handle ops for Mozilla's crash reporting pipeline for Firefox [0] and our
symbol server [1], among other things. I know our respective development teams
stay in touch, and I hope we can find a way to use symbolic/symbolicator to
simplify our stack.

[0]
[https://socorro.readthedocs.io/en/latest/](https://socorro.readthedocs.io/en/latest/)
[1]
[https://tecken.readthedocs.io/en/latest/](https://tecken.readthedocs.io/en/latest/)

------
SEJeff
I've used sentry since Dave Cramer (sentry original author) was working back
at Disqus years ago. It's excellent software that fills a really important
niche. It is wonderful to see he managed to build a solid team and company
around it.

------
larrik
I really like sentry, but I'm sad that the URL scheme changed (from
sentry.io/<org name>/<project name>/ to
sentry.io/organizations/<org>/issues/?project=<meaningless int>)

~~~
zeeg
Yeah it's not great right now, but will likely change again.

e.g. sentry.io/issues/SEN-12345

We're also introduce a much more comprehensive event search which will require
event permalinks so we're sorting some of that out.

Feel free to throw additional feedback our way. Best place would be on
forum.sentry.io to make sure the team actually sees it.

------
scardine
Hey @the_mitsuhiko, any plans to support Django Channels (daphne) out of the
box? Debugging async stuff is tough.

~~~
the_mitsuhiko
There are no concrete plans. It's not clear to me what this would entitle.

------
js2
I built Yahoo's in-house mobile app crash reporting tool a few years ago
(still in use). I used an on-premise install of Sentry as the UI. At the time,
Sentry didn't really support mobile error reporting, so I built something much
like what's detailed in this post and called it the Processor.

I regret never having made the time to open-source what I built. The Processor
is written in Python, takes reports from mobile devices, unwinds,
symbolicates, retraces, unminifies, etc as needed, then generates a Sentry
"event" and forwards that to our on-prem Sentry instance.

I also built the SDKs. For iOS, I used PLCrashReporter. These days I'd
probably use KSCrash. An important point here. On iOS, the unwinding is done
on the device. So all you have to do on the backend is symbolicate it. Another
point: it's relatively easy to get iOS system symbols. Plug an iOS device into
a Mac running Xcode and the symbols are transferred from the device to the
Mac. You can then harvest them however you need. In fact, Apple has apparently
stopped encrypting OTA updates so you no longer need an iOS device to get the
symbols:

[https://github.com/Zuikyo/iOS-System-Symbols](https://github.com/Zuikyo/iOS-
System-Symbols)

For Android NDK crashes I've tried a few approaches and still don't have a
satisfying solution. Originally I went with breakpad + minidumps on the
device. On the backend, the Processor runs the breakpad stackwalker on the
minidump. Another important point: the unwinding is occurring on the backend
in this case, unlike iOS where it's done on the phone. (A minidump is
basically just a snapshot of all the thread stack memory, plus some extra
diagnostic info.) But to unwind reliably off-device you need the Android
system symbols (in addition to the app's symbols obviously). Well good luck
with that. Google makes the original Nexus Android OS images available so you
can harvest those but you'll never get symbols for all the various Android
devices. I built a tool that can harvest symbols off a device and tried to
crowdsource them from Yahoo's developers but it's not been very successful
(there's a lot of flavors of Android).

Another issue is that minidumps are relatively largish to deal with. So my
second approach was two-fold. I'm still using breakpad's crash handler on the
device, but I now have it generating the much smaller microdump format. In
addition, I've added libunwind to our Android SDK so that after capturing the
microdump, I attempt to unwind on the device (also collecting function names
during unwinding) and add that info to the report. The Processor then only
needs to unwind the microdump if the unwinding on the device failed. Otherwise
it just needs to symbolicate. This hasn't been wildly successful though.
Unwinding on an Android device is trickier than on an iOS device. Also, it's
almost impossible (well I haven't figured out how) to unwind through the
ART/Java frames that called into the native code.

Of course the vast majority of Android crashes are in Java code and this is
much easier to deal with these. They are unwound just find on the device so on
the backend you only need to deal with deobfuscating the ProGuard minification
which is easily done using the mapping file generated by ProGuard.

What's really annoying with native mobile crashes is that both Android and iOS
have their own services for both capturing crashes and unwinding on the
device. And because these are integrated with the OS and work out-of-process,
they are much more reliable than anything you can do in-process using
something like PLCR, KSCrash, libunwind, etc.

But, neither OS gives an app access to its own system generated reports. All
you get is the lame reports the devices upload to Google Play Console / iTunes
Connect.

Anyway, thank you to Sentry for providing such a great product and I'm sorry
again I wasn't able to contribute more. I'm not sure what I built would work
at your scale. It's interesting we ended up with similar designs.

~~~
the_mitsuhiko
> Another point: it's relatively easy to get iOS system symbols. Plug an iOS
> device into a Mac running Xcode and the symbols are transferred from the
> device to the Mac.

Indeed. But sadly Apple does not provide a symbol server like Microsoft does.
We are maintaining our own internally. I wish we could open it up to the world
but I'm pretty sure that it's not legal to redistribute these.

> For Android NDK crashes I've tried a few approaches and still don't have a
> satisfying solution.

That is indeed overall a pretty frustrating situation. It's similar for linux
in general where it's really hard to get all the debug symbols collected. And
even if debug symbols exist, they are not stored like you would expect from a
symbol server. Very frustrating.

I'm quite annoyed that there is so little support from the platform holders to
provide production debugging APIs. One would think there is a higher demand
for this :(

~~~
js2
> Apple does not provide a symbol server like Microsoft does. We are
> maintaining our own internally.

Ditto. But the maintainer of the repo I linked to has thrown caution to the
wind and thrown them all up on a Google Drive:

[https://github.com/Zuikyo/iOS-System-
Symbols/blob/master/col...](https://github.com/Zuikyo/iOS-System-
Symbols/blob/master/collected-symbol-files.md)

