

Scaling Clearbit to 2M API requests per day - sergiotapia
http://stackshare.io/clearbit/scaling-clearbit-to-2m-api-requests-per-day/?utm_medium=social&utm_campaign=stackshare_weekly_962015

======
manigandham
Maybe I'm missing something but as the other comments have pointed out, this
is very low scale. I'm not sure it can even be called at-scale in any sense
these days.

Also the architecture seems incredibly fragile and complicated for what it's
doing. I get that they've built some integrations and backend processes to
assemble this data but the API serving could all run on a single app and
database server since it's all just reads.

~~~
harlow
Hi manigandham, Harlow here (author of the post). You are absolutely correct;
serving 2M API requests from a local data store wouldn't be considered "at-
scale" these days.

In hindsight I should have added more information about where the actual work
is being done -- I definitely missed the mark on parts of this post.

The lions-share of our work is done in an async fashion. 2M API requests
(lookups) turns into 40M+ background jobs. These jobs fetch, aggregate, and
scrub data from a number of downstream providers.

------
thinkindie
2M req/day are roughly 23 req/s on average. I understand there might be busier
periods, however it doesn't seem to be impressive figures.

~~~
remon
Thank you. Using days as your time unit for measuring requests over time is
not useful. I'd be more interested in the req/sec numbers during peak.

~~~
thinkindie
even by saying the volume of request at peak time doesn't tell the whole
story, we still don't know how heavy are such requests.

------
Bonogongo
2M req/day = 2000000/24/60/60 = 23 req/s average. With a assumed peak of 10x
of avg. this is around 230 req/s.

Hmm. Not something I'd use the word 'scaling' for, even if there is a 1:1
write to DB ratio. With 1000 writes/sec to a DB it gets interesting.

~~~
MichaelGG
Divide that over 18 machines, too!

I end up doing this with customers. "Well we need a solid hosting system. Our
site gets 2 _million_ visits a month!" Me: You know, that'd run alright off my
iPAQ. (Actually, with a caching frontend like CloudFlare, it really would.)

Edit: Not to be dissing on their post. I think it's interesting and good that
people post such articles. It's just that the sense of scale is off. There was
a big article here on some large company, I think it was bitly, and the totals
came out pretty low. Especially for the amount of servers. Even Twitter was
only peaking at a few KTweets/s a bit ago (granted they do a lot of work per
tweet but still).

~~~
Bonogongo
" Divide that over 18 machines, too!"

Yes therefor I'd said DB writes, which is the first step stone where you need
to think a little in scaling.

------
wereHamster
> Git push to only one repository.

I'm interested in that aspect. Do you use a single repository for all your
code and configuration? Does it also hold the state of your infrastructure
(instances, fleet definitions etc)?

~~~
harlow
We have a Git repo for each of the services, and we have a separate repo for
each of the Fleet Unit files.

> Git push to only one repository.

This refers to a per-service repository in our deployer app.

Previously we had to push code to each of the servers running a service. Now
we push to the deployer app and it leverages Fleet to distribute the code
across the available boxes.

