
How We Manage a Million Push Notifications an Hour - shadykiller
https://blog.gojekengineering.com/how-we-manage-a-million-push-notifications-an-hour-549a1e3ca2c2
======
gdeglin
We had a similar challenge supporting sending this many notifications at
OneSignal, which we solved using Rust.

We recently hit a peak of 850 Million notifications per second, and 5 billion
notifications per day. Here's a blog post on how we do it. Written back when
we were at "only" 2 billion notifications per week:
[https://onesignal.com/blog/rust-at-
onesignal/](https://onesignal.com/blog/rust-at-onesignal/)

~~~
tempguy9999
No offence but is this correct?

> We recently hit a peak of 850 Million notifications per second

per _second_?

From your blog "OnePush is fast - We've observed sustained deliveries up to
125,000/second and spikes up to 175,000/second."

I think you may have a typo. The bandwidth would be incredible too, if it were
an unlikely 10 bytes per delivery that would be 8.5 GB/Sec.

~~~
gdeglin
This post is now a couple years old, so we've grown a lot since then. Here's a
newer (more marketing-centric) post where we announced our 850k/second
milestone: [https://onesignal.com/blog/throughput-
record/](https://onesignal.com/blog/throughput-record/)

It's generally under 4 bytes per delivery, depending on the content, and we
have several delivery servers. APNS, for example, doesn't support payloads
larger than 4 bytes.

~~~
Arkanta
I think you mean 4kb rather than 4 bytes

------
londons_explore
1 Million per hour is only 300 per second. On a 1.5 Ghz 4 core raspberry pi,
that gives you 21 million clock cycles to deal with each message.

The architecture seems rather overengineered considering a single raspberry pi
could do the job, even after 100x scaling!

~~~
thewarrior
And what happens if the raspberry pi goes down for an hour ?

~~~
londons_explore
You spend $35 on a spare.

~~~
thewarrior
And what if you don’t want to lose the notifications in the time it went down
?

------
lachlan-sneff
1 million events an hour is just 278 a second, which is a much more
surmountable number than 1 million.

~~~
fourier76
What if you get one push event on the first second, and then 999,999 on the
last second? Still one million per hour. Don’t assume the events are evenly
distributed. I see this a lot on HN. Yes, 1,000,000 divided by 3600 is indeed
about 278, but why would we assume it was 278 notifications exactly every
second?

~~~
pkaye
Then they should have said they can handle a million push notifications a
second.

~~~
fourier76
That doesn’t make sense?

~~~
basilgohar
I think very few people, even on Hacker News, are so pedantic to argue
strongly that someone is lying when they call 999,999 notifications in one
second "a million".

------
ian0
I used to have pretty much the same reaction as most of the comments here on
gojek engineering posts. Coming from a telco background the RPS figures were
quite low while the complexity of architecture seemed completely over the top.

However, I now know more about the company. They have one of the most lean
engineering teams of comparable companies here. Their CTO & VP Eng are
incredibly practical guys whose advice would resonate with pretty much
everyone here (Eg interview here[0] where they repeat the dangers of scaling
too fast). They have small multifunctional product teams and the microservices
architecture fits to this (as opposed to building it for fanciness / resumes
sake). And they did it. Theres dozens of "features" in the app which are
entire giant businesses in their own right.

So Id imagine the shortest path to getting this up and running this was
definitely considered, would have been nice to go into more detail as to why
simpler / outsourced solutions would work but id guess theres a good reason.

[0]
[https://www.youtube.com/watch?v=He0XBBfCEVk](https://www.youtube.com/watch?v=He0XBBfCEVk)

~~~
mkagenius
> but id guess theres a good reason.

Sometimes there are no good reasons, and we should have the ability to
question it sometimes, that's how progress is made, in science and in
engineering.

------
lwansbrough
I'm currently implementing a notification system for our network of sites.
Aside from the HTTP microservice, my implementation is about the same. Which
isn't much of a surprise - I expect this is a fairly common way of doing it.

With that said, I was surprised to learn how complicated it can be to send
notifications cross platform. Fortunately we're targeting the web, so I really
only have to worry about two push provider implementations: APNs and VAPID. I
really hope Apple agrees to implement VAPID sooner than later so developers
can stop wasting their time.

We have used OneSignal in the past, but there's something so satisfying about
delivering the notification yourself. Also, a word of caution for people new
to push notifications: we discovered early on that sending a million push
notifications all at once is a really good way of crashing your site, as they
have a surprisingly high click through rate in the first minute of sending!

~~~
philjohn
Why not use Amazon SNS? They abstract away the differences between APNS and
whatever Google are calling their push infrastructure these days.

------
kondro
I don’t mean to discount the work done here, but why would a dev team build
this themselves these days? There are many practically infinitely scalable
services that do just this with various levels of sophistication.

From the simple AWS SNS ($0.50/million messages) or Google Firebase (free)
solutions that do multi-device push messaging whilst managing keys, redelivery
and delivery responses to the more managed services like Urban Airship and
OneSignal.

The actual Android delivery component has to end up going through Firebase
anyway.

The only part of this solution SNS and Firebase don’t provide are the user
fan-out functionality.

Surely there are other more important features delivery business value to be
done rather than re-inventing (and then scaling and supporting) the delivery
of mobile notifications.

~~~
thewarrior
Pretty soon your app is just a bunch of proprietary cloud provider products
glued together and now there’s vendor lock-in and much less flexibility.

You might want to throttle messages based on user settings or say ML models to
avoid too many annoying messages. For critical stuff you might want to switch
over to email or SMS.

These are startups that have the funding to do this kind of thing. This is a
core part of the stack for any such company.

~~~
kondro
If that's your concern, lean on Firebase.

You _have_ to use Firebase to deliver messages to Android anyway, having it
deliver to APNS as well you get for free with that implementation.

Maybe you _do_ want to do those things, but this implementation doesn't, nor
do they mention the intention to do that. But if you did, SNS also provides
implementations for delivering SMS and Email (preferably through SES).

Startups _shouldn 't_ waste their money on this type of thing, especially if
you consider most startups will fail. Vendor lock-in is a pretty stupid thing
to worry about for an early-stage start-up (and I argue it's stupid to worry
about long-term too if that vendor is AWS, Google or Azure who are all very
competitive with each other).

~~~
TheCycoONE
Having a middleware to abstract firebases API from the services makes sense if
just to minimize the disruption the next time Google changes it. I've always
left tokens up to the callers to sort out before calling the notification
service, but I see the appeal of centralizing it if you have a lot of small
products. I agree there is no longer any good reason to directly target APNS.

Nothing in the article really said how they got to millions an hour. In my
experience the notifications are cheap enough to send that that figure is easy
to obtain without any special effort on the push side. It's also
embarrassingly parallel/easy to scale. How everything else scales around that
(recipient filtering, reporting) is the interesting part.

------
kaycebasques
I had a blast using Go-Jek and Grab when I visited Indonesia. It’s basically
Uber/Lyft, except you sit on the backseat of a scooter rather than a car.

[https://www.instagram.com/p/BZHty8NnTVZ/?igshid=5qg5ndm9r9ui](https://www.instagram.com/p/BZHty8NnTVZ/?igshid=5qg5ndm9r9ui)

Of course as others mentioned it’s a super app so they do many other things,
but I believe this is one of their main businesses.

~~~
thelittleone
Go-Jek has been huge for us as holiday rental managers in Bali. But lately
there seems to be some sort of racket by drivers:

Example for Go-send (sending a package): "I have motorbike problem please can
you cancel".

A minute after cancelling I get a message on WhatsApp: "Hey fixed my
motorbike, still need?". Same thing with food orders.

Example for the Go-food: "Hey I'm at the restaurant, the phone is down. Can
you cancel order? I will do manually".

Obviously the restaurant and the driver gets the full customer payment this
way. Go-jek gets zero. This depends on the customer cancelling as otherwise
the drivers rating takes a hit.

~~~
reportgunner
Wow that's really clever!

~~~
thelittleone
The average monthly full time salary in Bali for a local is around $270 USD. A
Gofood order could easily be $100. Last time I checked Gojek takes 20% of the
order value. On a $100 order that could get them $10 (assuming they split the
$20 evenly with the restaurant). One a day for a month and you've made an
additional average monthly salary. If you think about that in terms of a
Western IT annual income (assume $60k) that's an extra $5k a month.

------
shanipribadi
The title of the blog is actually a click-bait.

The point of the article was more that when you have multiple teams, each
owning multiple products, having a single well-defined abstractions over
external dependencies provided as a service that you control is important to
manage risks posed by those external dependencies and to make overall
maintenance easier.

------
srameshc
This is cool, but I found this things called MQTT protocol and Emitter.io is
one implementation of it. You can build your own notification system on top of
it for a much cheaper cost.

------
jbverschoor
Why is less than 300tx/sec a such a big deal?

------
jzl
I appreciate the positive effort to communicate something useful, but this
basically came down to "We use RabbitMQ".

------
didibus
I've seen a few good example blog posts discussing this scale, the 10 to 500
transactions per second and learned a lot from all of them.

Does anyone know of any description of architectures scaling to the next
magnitude? 1000 to 10 000 TPS ? And even higher to say 100 000 to 1 000 000
TPS ?

~~~
random42
Its difficult to "guess" ahead of the fact how you are going to scale for the
next 10x, because you need to know.

a.) Which component is going to start breaking, and that depends on the usage
pattern.

b.) Which business/tech compromises are ok to make, and that depends on a.)

Generally speaking though, you'd try to benchmark the system to find the
bottleneck component, and based on the nature of it, either try to throw more
hardware (horizontal or vertical scaling) or optimize the software.

------
finchisko
I thought this would be about implementing custom notification server and not
using google's or apple's.

------
reportgunner
Is that a lot of notifications ?

------
uvu
I have done similar like. I think I do much better than those. But, I don't
pass their interview haha.

------
dis-sys
1 million notifications per hour, or ~300 per second, seriously?

