
How NextRoll uses AWS Batch for daily business operations - manigandham
http://tech.nextroll.com/blog/dev/2019/11/19/aws-batch-at-nextroll.html
======
vrih
This article is actually a bit of an advert for when not to do batch
processing. Processing 30 X 80B events every day when the same result can be
achieved by maintaining state on about 30B users (more than enough for
acceptable global coverage) and stream the data through to update that state.
If all you were doing is the attribution part you would be looking a near an
order of magnitude saving on their current costs and deliver results in real
time.

The ML side may well benefit from batch but I bet splitting out the
attribution component would allow them to be more flexible and cost efficient
in their approach there.

~~~
CaveTech
There’s quite a few odd things in their setup, whether it’s inflating numbers
to seem cool or terrible inefficiency I can’t tell.

Doing 80B predictions a day for 500,000 conversions would make this the lowest
efficiency ad platform I’ve ever seen by several orders of magnitude.

~~~
vrih
Those numbers aren't too far off. That's making a decision on every auction at
1M QPS, which is most of what is worth listening to in RTB globally. Most of
those decisions won't actually involve much ML though, it'll be straight
targeting rules matching and checking whether there is any budget.

The actual attribution shouldn't be on 80B though. It's pointless analysing
requests you didn't buy. There might be some value in using that data in ML to
feed into a pricing algorithm, but it would be marginal and I doubt
cost/benefit would ever stack up. it would technically be in breach of pretty
much every SSP contract I've seen (although everyone does it).

~~~
dialtone
It's a bit more complicated than that given that attribution looks at the past
30+ days of data (120 in NextRoll's case) to determine if any marketing
activity happened and allows the customer to adjust the attribution window to
whatever they want.

There are 150B+ auctions each day, of those we participate in at least 80B and
in those 80B there are at least 5 separate predictions, to determine the type
of auction (1st price v 2nd price for example), determine the price likely to
win, determine the likelihood of the placement being viewable, determine the
likelihood of the user to click, determine the likelihood of the user to
convert given that they click, and then we run these last 2 for each candidate
(campaign, creative) that is eligible for the current auction. We obviously
don't analyse the stuff we didn't buy but 80B IS the number of top level ML-
generated prices from our system.

The budgeting and targeting rules don't apply to the 80B number and they are
slightly different systems.

