
How Ipdata Serves 25M API Calls from 10 Global Endpoints for $150 a Month - jonathan-kosgei
http://highscalability.com/blog/2018/4/2/how-ipdata-serves-25m-api-calls-from-10-infinitely-scalable.html
======
orf
Interesting read, thanks for posting, but I think you can do better. I've been
playing with Fastly and VCL recently and you could write your whole app in VCL
and have it served right from the CDN. 25 million requests would cost about
$20 for that. If you include the API keys in the log output and batch-process
that each day/hour/whatever to keep usage stats, that would remove the need
for Dynamo? You'd then keep a table of banned API keys and update that through
the Fastly API dynamically.

Just a thought. Still, $150 p/m is not bad!

~~~
zzzcpan
Considering it's like 20 simple requests per second, you can do it on a couple
of tiny virtual servers (for geo and redundancy), totaling less than $10 per
month.

~~~
JrSchild
My strategy is to use Cloudflare workers to invoke a Lambda function. This way
I have a cheap AWS API Gateway replacement. $3.70 per 1M requests becomes only
$0.50. Costing my only $12.5 for 25M requests + the Lambda invocations which
are hella cheap.

~~~
jonathan-kosgei
Interesting strategy mate! Any links on how to do that? I've googled a bit but
haven't find anything on this!

~~~
JrSchild
[https://blog.cloudflare.com/cloudflare-workers-
unleashed/](https://blog.cloudflare.com/cloudflare-workers-unleashed/)

Yes check this out. What you'll have to do is build the aws sdk into a single
javascript file that runs as a Web Worker. These are distributed to over a 100
servers worldwide. Then you would want to invoke the Lambda function from the
worker, link below. You could potentially distribute this further by invoking
a Lambda function geographically close to that specific worker. But this setup
will be very fast already.

[https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lamb...](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html#invoke-
property)

~~~
jonathan-kosgei
Awesome! Thanks!

------
seanp2k2
That does seem good, but when you consider that it’s basically just doing a DB
query for an IP (and there might be a clever way to store CIDRs where the DB
finds it within a range) + authentication, authorization, and accounting (
AAA), it becomes a bit less impressive. I’m guessing again that the AAA stuff
is actually more work than the actual lookups. I’ve made some services which
use geoip before with in-memory lookups. Since it doesn’t change that
frequently, aside from AAA, you don’t really need an external DB at all — just
have the actual executable contain the lookup info, now you’re down to just
lambda. Actually, don’t even build a service — just distribute a binary with
the tables inside, so then customers don’t even need to make a call, wait, and
pay for each call. Consider how well _that_ would scale (and indeed this is
already the business model for some of the MaxMind products (e.g.
[https://www.maxmind.com/en/geoip2-city](https://www.maxmind.com/en/geoip2-city)
), but it doesn’t make for as interesting of an article.

~~~
ddorian43
How I scaled my single-function + single-db-query api!

~~~
jonathan-kosgei
If anything it's closer to - How I built a multi-regional apigateway API with
multiple creative workarounds despite there being no official way to do it and
it only becoming possible to do so 5 months ago :)

~~~
kreetx
Hear hear! Clearly a significant amount of effort has went into figuring out
this design, and it's interesting to read about it.

EDIT: And you're the author -- thanks!

------
sandGorgon
super cool article!

I do wonder if you didnt try Apistar
([https://github.com/encode/apistar](https://github.com/encode/apistar)) . Its
built by the author of django rest framework .and also is much more frequently
updated

~~~
chimen
What is apistar supposed to be solving in this context?

~~~
jonathan-kosgei
I haven't worked with it yet, but it looks to me to be a Python framework just
like Japronto or Flask but one that is focused heavily on building APIs.

~~~
chimen
It was a top/root comment that's why I asked. I thought it does something
special. Also worth a look: [http://python-eve.org/](http://python-eve.org/)

~~~
jonathan-kosgei
No prob, Eve looks pretty cool. I'd love to benchmark it alongside japronto
and apistar. Thanks!

------
samwillis
Out of interest where did you get the regional data for IP addresses from? Did
you collect it yourself, if so how? There seem to be very few sources that
allow you to resell it.

~~~
jonathan-kosgei
We aggregate the data from a couple of sources, see
[https://ipdata.co/about.html](https://ipdata.co/about.html)

------
xstartup
Why not push to Kinesis directly in a background worker?

I understand printing log is simple but you still have to push that log to
cloudwatch... it's still being done on the same system. So, you end up paying
that cost either way.

You can use MongoDB atlas with multi-region replication for low latency
queries instead of DynamoDB.

