
Bitly: Lessons Building a Distributed System that Handles 6B Clicks a Month - aespinoza
http://highscalability.com/blog/2014/7/14/bitly-lessons-learned-building-a-distributed-system-that-han.html
======
daigoba66
"4 boxes with 1x RAM will always be cheaper than 1 box with 4X RAM."

Is that really true? RAM is dirt cheap compared to everything else that goes
into a server.

And then later under "Lessons Learned" we have: "Put it all on one box if you
can. If you don’t need a distributed system then don’t build one. They are
complicated and expensive to build."

~~~
titusjohnson
I haven't run the numbers, but I would imagine that 4 boxes with 256 GB of RAM
are significantly cheaper than 1 box with 1TB of RAM, especially if the rest
of the box isn't top-end equipment.

~~~
gtuhl
The density of the DIMMs is where the price difference jumps. Right now 32GB
sticks push the price way up. Common current chassis have 24 slots letting you
hit 384GB total with no big surprises on 16GB sticks.

Getting to 1TB would certainly require 32GB sticks and would be the point
where multiple machines gets cheaper.

------
drakaal
6 Billion Clicks is not the hard part, Storing analytics for those clicks is.
Though this is a pittance compared to what Google, or Adobe serve with their
analytics.

I don't know enough about how "realtime" Bitly is. Handling 6 Billion writes a
month of "raw" data from a user as a serialized string would not be that hard.

2300 clicks and there for analytic writes per second average for the month is
likely based on the 80/20 rule about 10k Clicks per second peak.

Now, if we assume that you write a Serialized Write immediately with all the
data from a users, and then do a chasing analysis, so that you don't have to
do all the work at Peak Price... and that each user is then 20 writes.

We end up with 10k peak Serial, and 5x that in chasing writes. So we need 60k
writes per second.

On DynamoDB that would cost $90k upfront and $7.71 an hour. ($99,500 a month)

That is "a lot" but it isn't huge.

Doing this on Google AppEngine would likely be about the same since you pay
fixed fee per write, not based on your throughput. Depending on the amount of
indexing you would pay $1.80 - $2.40 per 100k clicks based on the above math
so $108k - $144K per month.

I am not familiar enough with Azure to quote a price.

I know there would be other costs. This is just the database portion. But as I
expect this to be the majority of the price, I thought it was the part most
worth discussing if you were building a Bitly on a Cloud Platform.

~~~
sanswork
I use to run an ad server where numbers of requests like this per month would
be considered low(we would regularly do hundreds of millions of requests per
day peaking at around a billion per day). We had the added fun of requiring
extremely low latency and we couldn't toss 400 servers at it either.

With chasing writes you do it in slow periods between traffic bursts since
you're basically just pulling them off a queue to process so you don't need to
count that in with your peak burst numbers.

Your costs seem really high too. The above system was ~$10K/month on GoGrid
including 2 DB servers that were on dedicated servers(not really impressive
ones either I think they were ~$500/month each), a load balancer on a
dedicated server, 2 dozen webservers or so, a few support servers(admin
panels, client interfaces, puppet, etc), and a small hadoop cluster.

Redis would receive the raw data, the DB stored the rolled up data and the raw
data/logs would be compressed and go onto a small hadoop cluster in case we
needed to process it for a new type or report or look for something specific.

~~~
drakaal
I agree, I was trying to keep the math simple, and give some play for bigger
peaks.

An ad network is another great example of where this would be tiny numbers.

And with Ads more than Analytics you have to be aware of Race Conditions, and
have to do more management of reads and writes so that you don't over or under
serve a campaign.

------
tedchs
FWIW, this is about 2,300 qps on average. Nothing to sneeze at, but the per-
second scale removes the awe factor of "6 billion a month".

~~~
MichaelGG
And 30 servers handling the frontend! Even if they peak at 10,000/sec, that's
only a few hundred a server/sec. And another 370 servers to do other stuff.

~~~
themartorana
Right now, we peak at 11 servers for ~550-600 rps - those are AWS c3.medium
servers. We're moving from Python to Go to try to squeeze more out of each
server.

But our bottleneck is MySQL, and are moving to Riak. Our DB is the only part
of our stack that isn't inherently horizontally scalable - which seems to be
the case for a lot of services that are hitting that 500 rps rate (maybe 750
qps or so).

------
jpeterson
Does it bother anyone else that services like bitly add a single point of
failure to URLs? Also, it's kind of amusing that they've spent so much
engineering effort to distribute a protocol that's already massively
distributed by default.

------
davidw
The important bit for most people:

> Put it all on one box if you can. If you don’t need a distributed system
> then don’t build one. They are complicated and expensive to build.

For most businesses that aren't about massive scaling, your time is probably
better spent on marketing, new features and so on.

Interesting for me: they are, or were, users of Tcl.

~~~
jimktrains2
> For most businesses that aren't about massive scaling,

Most business should care about distributed systems because they (often) need
some form of high availability. Sure, you're not scaling to Google or New York
Times levels here, but if you want to survive any of the random crap that can
happen to you, you can't have a single host per role.

Yes, sure, you may not be worried about concurrent writes to multiple
replicas, but what about reads from multiple replicas, or a read to a slave
after a write to a master (lets say master just died in a horrible manner, and
slave is now being hit with reads and writes)?

PS: TCL is awesome.

~~~
davidw
> (often) need some form of high availability

I'd guess that, in reality, many don't really need high availability. I mean,
it's not going to hurt if they have it, but if it comes at a _cost_ of not
doing other things with their limited resources, it may not be the best thing
to be doing.

BTW, it's Tcl, not TCL!

~~~
jimktrains2
Gah, I should know better about the name. Let's just pretend I was yelling it
because I was excited? k?

Anyway, it's a gradient. Sure, most places don't need 99.9999% availability,
but most places also can't afford to be down for a few hours to a day,
especially during their peak season.

So, do you need automatic failover and promotion? Probably not. Do you need
contingency plans (with may/should include accounts with multiple vendors
(even if something like a LiNode account with no boxes)) and have practiced
brining everything up (or if you're e.g. a small retail shop, how to
checkout), yes, most defiantly.

Even if your distributed system isn't "moving" you still need to plan for
things. Like I said, if your database server becomes unavailable, you need to
know how much data hasn't made it to the spare, otherwise you need to ask "How
much will I loose if I restore from my last backup". Things of that nature
need to be planned around and known, even if the code doesn't need t care.

------
cdman
6B a month =~ 2k per second. Just putting the numbers in perspective :-)

~~~
iampims
That's assuming a linear distribution of clicks and we all know it's never the
case. It could well be over 10K sustained for peak hours. Averages are often
misleading :)

~~~
ryanjshaw
Without wanting to detract from the engineering accomplishments here (which I
have never come close to), it's important to note that low-latency does not
appear to be a design criteria i.e. it's okay if it takes a couple minutes to
process events during peak load, which means there is some leeway for
smoothing the input peaks over processing time.

Furthermore, these are just click events. It's okay to lose a few, so the
design doesn't have to be especially good at making sure events aren't lost,
as far as I can tell.

A good design is as much about what is left out as what is left in, which is
the lesson between the lines here in my opinion.

~~~
fizwhiz
My thoughts exactly. If you come to my house and ring the bell, but it takes
me 15mins to get off the couch to reach the door, I can't really claim to be
"available" with a straight face. Sure, I "technically" am available, but that
level of latency is not practical.

~~~
i0exception
The system is still available if your ring gets an acknowledgement of receipt.
The latency for the request to be served is a design metric and has no impact
on "availability"

~~~
fizwhiz
>> The system is still available if your ring gets an acknowledgement of
receipt

This is the equivalent of (as per my example) yelling out "I hear ya!
Coming..." and then take 15mins to reach the door. I never said that low
latency implies anything about the level of availability; I merely meant that
arguing about the availability of systems is incomplete without a thorough
discussion of latency.

In the case of Bitly, I'm just curoius about the systems that are highly
available but "require" low latency vs systems that don't require it. As
ryanjshaw points out, the system may have a degree of tolerance for lossy
click events. If you have a heterogeneous mix of systems with different
tolerance levels, that surely affects the architecture does it not?

------
AndrewDucker
Annoyingly, for something designed to scale as much as Bitly does, it has some
very odd wrinkles.

Whenever I visit my stats page at
[https://bitly.com/a/stats](https://bitly.com/a/stats) it shows me the stats
from the last time I visit, and I have to do a hard-refresh with Ctrl-F5 to
get the latest version.

Or possibly this is part of how you scale - only retrieving the latest
versions of stats when you need them. And not with any accuracy (clicking on a
given day pretty much never gives me numbers of clicks that add up to the
total for that day).

------
gedrap
While reading people talking about writing data to permanent store, I got an
interesting (well at least for me!) question.

My initial idea was that it would be possible to store data to be written at
RAM at first, and periodically flush to hard drive / DB. But then on other
hand, OS does that already by using cache and flushing it, just at much lower
level. Some DBs are probably doing it too.

So the question is, is it worth implementing strategies like that (home made
cache), or it is a better idea to trust OS/DB by default?

~~~
rakoo
Trust the OS, and don't use 70's technique like separating RAM and disk.
Here's a really interesting piece on how PHK built varnish by using "modern"
software techniques:

[https://www.varnish-cache.org/trac/wiki/ArchitectNotes](https://www.varnish-
cache.org/trac/wiki/ArchitectNotes)

And here's a Key-Value db you can use in your own programs that reuses the
same idea, ie _ask the OS for some space, trust it for RAM /Disk allocation_;
it turns out it is excellent at what it does thanks to careful design:

[http://symas.com/mdb/](http://symas.com/mdb/)

------
korzun
This article is about horizontal scaling and distributed systems.

Some commenters here are missing the point.

------
innguest
"Or, you could just avoid the OS altogether" [and get 40 million requests per
second]:

[http://highscalability.com/blog/2014/2/13/snabb-switch-
skip-...](http://highscalability.com/blog/2014/2/13/snabb-switch-skip-the-os-
and-get-40-million-requests-per-sec.html)

~~~
Mandatum
6B requests/mo = ~2200/sec 40M requests/sec = 105 trillion/mo (or the number
of red cells in the human body times 5)

------
mianos
2300 a second sustained. I was doing that on PA RISC under HPUX in 1995. BFD.

~~~
matthewmacleod
1\. Don't be a dick.

2\. It's obviously more than 2.3k sustained; there will almost certainly be
variance, maybe up to about 10k per second.

3\. It does seem rather unlikely that you were serving 2300 clicks per second
in 1995, given the minuscule scope of the web at that time. That said, if you
were, I'm sure many of us would be interested in hearing about it. It would
probably be more productive than bitching.

~~~
mianos
The Shanghai Stock Exchange trading system. In memory, simple text based
messaging over TCP protocols. The main reason for my distain is the trivially
shardable nature of URL shortening. Analytics is non real time so it does not
count. Down vote away.

~~~
matthewmacleod
So in other words, you did something totally different.

~~~
peterwwillis
I helped admin a site in 2001 that did tens of thousands of hps to dynamic
content on a mod_perl app. And Bitly's doing static content (301's) of what, a
500 byte payload? Our static site layer did hundreds of thousands of hps with
minimum a half meg of payload.

The guy's got a valid point. Bitly's using way overcomplicated tech and
employs way more engineers than you need to host a trivial amount of traffic
for a large scale site. This is a textbook case of over-engineering.

~~~
ianstallings
It wouldn't surprise me if that was a standing order from the top - make sure
it can scale to everyone on the planet. I built a service not too long ago
where they were expecting 50M users in the first year. Real numbers? It's
hovering around 300k users. But the good news is it can handle those 50M if
they ever come :-) It is highly over-engineered.

