
AppLovin: Processing 30B Requests A Day - coloneltcb
http://highscalability.com/blog/2015/3/9/applovin-marketing-to-mobile-consumers-worldwide-by-processi.html
======
themartorana
And I was proud of my API backend processing 70M requests a day... That's
suddenly feeling like a lot less traffic and a lot less spectacular.

~~~
elcct
I was able to process about 250M requests a day on a dedicated server (E3)
without breaking a sweat.

~~~
ddorian43
It depends on what you're doing in those requests.

Ex: Realtime stats system. On each request doing several
get,set,increment,push-into-queue, geoip + 1 bulk-postgresql-upsert every
second. 1ms response, 400 requests/second, 3 python processes, 10$/month
shared hosting.

------
alexatkeplar
Thanks for sharing this - it's great to see this level of detail on an adtech
stack. I wanted to ask a little more about the user profiles in Aerospike.
What process is writing 12k profiles a sec, and what's the read path - is
Aerospike e.g. wired into Nginx using [https://github.com/aerospike/aerospike-
mod-lua](https://github.com/aerospike/aerospike-mod-lua) ? Any gotchas with
Aerospike versus e.g. a Cassandra or DynamoDB?

~~~
apple-fan
We don't really use Lua: we connect to Aerospike directly, using a C driver.
This saves a lot of time and allows sub-1ms responses.

I find Aerospike latency a lot more stable compared to that of Cassandra. Our
use case is quite simple: we treat Aerospike as a large and fast key-value
storage.

~~~
alstream
Thank you for your input again! May I ask if you run Aerospike on an in-house
cluster to reduce latency, or run on AWS?

------
alstream
Some amazingly cool stuff that any engineer would want to be part of! Could
anyone answer a few questions?

1\. Is the Spark cluster running on top of an in-house cluster or on AWS
EMR/S3?

2\. Any more details about the custom message system, specifically how to
guarantee exactly once delivery?

3\. How do you guarantee data is never dropped?

4\. "Make sure every message can be re-played without data corruption" how to
achieve this when there is so many data points coming in?

Again, thank you for the insights!

~~~
apple-fan
Thank you! Let me try to address those questions:

> 1\. Is the Spark cluster running on top of an in-house cluster or on AWS
> EMR/S3?

It runs on an in-house cluster.

> 2\. Any more details about the custom message system, specifically how to
> guarantee exactly once delivery?

This topic might deserve a separate article. We have built a custom locking
system that ensures proper delivery.

> 3\. How do you guarantee data is never dropped?

We write each data point into 2-3 locations in our system and also back it up
in S3.

> 4\. "Make sure every message can be re-played without data corruption" how
> to achieve this when there is so many data points coming in?

Basically, batch things and keep a lot of signatures and checksums around.

~~~
alstream
Thanks for your time!

About point 4, I'm assuming you built in idempotency into your message replay
system. Is your unit of idempotency, say, 1 hour or some other unit of time,
or is every single message have idempotent results? Because the latter would
be pretty amazing :)

------
kumarm
I remember seeing people complaining that their accounts on google play got
banned due to security issues with AppLovin SDK. Example:
[http://forums.makingmoneywithandroid.com/advertising-
network...](http://forums.makingmoneywithandroid.com/advertising-
networks/19723-applovin-performance.html)

Any comment on that?

~~~
coloneltcb
Hi kumarm - Michael from AppLovin here. Thanks for the comment, it's an
important concern. Here is a blog post outlining the problem and corrective
actions that were taken: [http://blog.applovin.com/applovin-security-
notice/](http://blog.applovin.com/applovin-security-notice/)

~~~
ddorian43
How do you loadbalance requests between the servers?

~~~
apple-fan
We use a combination of software (ipvs, haproxy) and hardware load balancers.
We plan on gradually moving away from traditional hardware load balancers.

