
How Sentry Receives 20B Events per Month While Preparing to Handle Twice That - ryangoldman
https://stackshare.io/sentry/how-sentry-receives-20-billion-events-per-month-while-preparing-to-handle-twice-that#.WgNIZEttZyw.hackernews
======
tyingq
20B/month is interesting in the sense that Sentry should only be getting a
transaction when there's an error in some third party application.

Makes you wonder if a couple of their clients paid for this error monitoring
service, then quit watching it. That's a lot of errors.

~~~
xbzbanna
Suppose you track all client-side errors. Some portion of your users have a
blocker that bans an analytics tool you use. Every hit generates an error. You
could disable the error if you wanted in that case, but why not keep it online
and use the graph on sentry to track how often it's happening?

~~~
acdha
One really big thing I learned from doing this on a public website: there are
a ton of ISPs, browser extensions, antivirus and malware which inject really
horrible JavaScript into every page. If you run Sentry as a front-end error
collector you will get a ton of bizarre-seeming errors for other peoples’ code
and, if you're really lucky, find that some of them used the same variable or
function names as your code.

~~~
numbsafari
You can set it up so that Sentry only tracks errors on a subset of pages. One
approach I've seen is to have 100% tracking on your canary deployments or for
a few minutes/hours after a deployment, and then use the tried and true `if
(rand() < .2) { sentry.log(...); }` approach to limit the errors you see. It
_does_ mean that you may not receive certain critical errors, but if you are
trying to stay under their billing tier thresholds or just trying to rate
limit yourself, it's a reasonable trade-off (especially if you are logging
100% on Canaries or for a period after rollouts). You can also customize the
error rate based on page so that, for example, you get 100% of errors on
checkout but only 10% of errors on the homepage.

~~~
acdha
I was self-hosting so it was a question of seeing traffic rather than quotas.
I'm loathe to sample errors since it often seems like the outliers are the
most interesting but at some point it's expensive to avoid.

------
dabernathy89
We started using paid Sentry recently (previously were on an old self-hosted
version). It's mostly been great, but boy is search clunky. Very difficult to
find anything that doesn't automatically appear on the first page of recent
events.

~~~
zeeg
It's definitely on our short list of problems to solve, but unfortunately the
solution takes a bit more than traditional feature development.

~~~
Waterluvian
This statement has me very curious. Are you able to elaborate at all on what
you mean? Like is this more of an underpinning architecture issue that can't
simply be addressed by scheduling dev time?

~~~
zeeg
The storage mechanism isn't fast/flexible enough to do anything really great.
It's fairly easy for us to do precise matches (with some caveats), but given
everything is cached at the database level we can't easily compute results
based on your search query.

Effectively what we do is:

SELECT * from groups where ID IN ( SELECT group_id FROM index WHERE key =
:searchKey AND value =:searchValue UNION [...] )

(This is a simplification but it should give you an idea of how the queries
are built)

Because of the way the model works, its very hard to do certain things like
exclusion queries, and more importantly all of the results you're seeing are
still cached. The biggest pain point here is if you're searching for e.g.
"ConnectionError environment:production", you really don't want to see
anything related to non-production. We're solving that problem immediately,
but its just the tip of the iceberg.

Next year we're kicking off a large project to overhaul the key infrastructure
which powers the stream/search functionality, with some pretty big ambitions.

Other services in the space generally use something like Elastic search, which
can provide some of this out of the box. We've always been built on SQL/Redis,
and given that Elastic has its own set of problems we've decided that it's
likely best for us to move to a columnar store format that doesn't cache
results (e.g. counts), but rather computes them in real-time much like Scuba.

~~~
js2
Sentry 6.x experimented with throwing various attributes of the event object
into ES. Have you ruled out using ES in the future?

FWIW, I'm still running a version of 6.x where I put the tags for every event
into Splunk of all things. The Splunk events then link back to the
corresponding Sentry event. It's slow and klunky, but it gives the Sentry
users at $dayjob a much better search interface and ability to slice and dice
on tags.

(Note, I'm not suggesting you use Splunk in your backend!)

~~~
the_mitsuhiko
I think we’re pretty sure about not using elastic. We toyed more than once
with it and we’re not confident to be able to solve our problems with it.

------
mclarke
seriously though, that gif of Matt is excellent

~~~
buryat
yeah, 9MiB for a gif is excellent

~~~
raresp
have you heard about Lazy Image Load?

~~~
mattrobenolt
Why would you want to lazy load it?

------
austinpray
I had the pleasure of hanging out with the Sentry folks in SF for a couple
days last week. Honestly a top-notch group.

It was eye-opening to see what is possible when you have an organization like
theirs where you have a lot of talent and pride in your product across the
board.

------
baristaGeek
We're using it and it 'magically' detects errors in CI that no human would've
had detected.

------
debacle
How does this compare to other competitors in the RUM space, including open
source tools?

~~~
the_mitsuhiko
I'm not sure what rum is (other than an alcoholic beverage) but as far as I
know Sentry is the only actively maintained Open Source tool in its space
(which is crash reporting). Might be wrong on that though.

~~~
rkwz
Real User Monitoring
[https://en.wikipedia.org/wiki/Real_user_monitoring](https://en.wikipedia.org/wiki/Real_user_monitoring)

------
CyberDildonics
Is 7,700 per second really that impressive?

------
synthc
too bad the dashboard is slow as molasses

~~~
zeeg
Could you reach out to support with your account details? There's a cost to
the data volume, but generally queries should respond very quickly.

------
Yokohiii
tl;dr we are hiring

------
DmitryOlshansky
Hate these X millions / billions per month or year or a centuary.

Let’s do the math.

So it’s 20B/30 = 660 m / day

660 / 24 ~= 25 m / hour

25000k / 3.6k sec = 8000 events/sec

Now peaks got to be larger then these measly 8000 events.

Even say peak is 50k/second.

Dang an Nginx can serve a web page faster then that on a single modern
notebook. CPU won’t even saturate to do that.

Or if you were to write them to a consumer grade spinning drive disk at
100mb/s you’d get 50k/s if each event is 2kbytes in size.

As usual I’m not impressed.

~~~
StavrosK
> Hate these X millions / billions per month or year or a centuary.

I hear you, I hate misleading examples too.

> Dang an Nginx can serve a web page faster then that on a single modern
> notebook.

"What? Your moon-rocket can only fit three people? My couch can do six. I am
not impressed."

~~~
mattrobenolt
What kind of couch do you have that can fit 6 people? Looking for a new couch.

~~~
StavrosK
Sectional, yes. I recommend it, it's very comforable.

