
How Algolia Reduces Latency - glenngillen
https://stackshare.io/algolia/how-algolia-reduces-latency-for-21b-searches-per-month
======
simonw
This is really interesting. Impressive to see a startup with so much bare
metal hardware (800+ servers in 47+ data centers), due to their need to power
autocomplete-style search and hence offer the lowest latency possible.

One detail I particularly liked (buried deep in the article) was this one:
"Once a machine is detected to be down, we push a DNS change to take it out of
the cluster. The upper bound of propagation for that change is 2 minutes (DNS
TTL). During this time, API clients implement their internal retry strategy to
connect to healthy machines in the cluster, so there is no customer impact."

Offering client API libraries that have a retry strategy baked-in and relying
on that for part of your high availability strategy is very neat.

~~~
dzello
Thanks for the comment simonw. It's not directly mentioned in the article but
we're also using the API clients to implement a DNS fallback strategy. If the
hosts are unreachable through their primary hostnames (.algolia.net), the
clients try alternate names (.algolianet.com) that are hosted by a different
provider.

------
danielvf
"When a query takes more than 1 second we send an alert into Slack."

One day, during a problem, every single query will take over a second, and
this will be an exciting Slack channel to be in.

~~~
geeio
I'd hope they have some sort of rate limiting built in

~~~
cromulent
Yeah, I hit it today.

[https://api.slack.com/docs/rate-limits](https://api.slack.com/docs/rate-
limits)

------
faizshah
Would be interesting to see the story of how you guys started from square one
and built out your bare metal infrastructure.

Really love the docsearch project btw, great UX.

EDIT: Just found this blog post right after posting this comment
[https://stories.algolia.com/algolia-s-fury-road-to-a-
worldwi...](https://stories.algolia.com/algolia-s-fury-road-to-a-worldwide-
api-c1536c46f3a5)

~~~
dzello
Thanks! Glad you're liking DocSearch. That's a good post about the journey.
There are a few more listed in our awesome-algolia repo:
[https://github.com/algolia/awesome-algolia#blog-
posts](https://github.com/algolia/awesome-algolia#blog-posts).

------
sergiotapia
Algolia is a fantastic product and this article helped me peek behind the
curtain as to what makes it tick.

~~~
mythrwy
Agreed. Very interesting to see behind the scenes.

Implanted search for a client using Algolia last month and was completely
blown away. The speed queries returned at were amazing. (Although it wasn't my
idea to use Algolia I'll definitely be looking for opportunities to use it
again).

~~~
sytse
We use Algolia for the docs and commercial website of GitLab and could not be
happier. You immediately notice the speed. Interesting to see how much effort
went into that. Also cool to see the term hybrid tenancy.

------
lazyjones
Isn't a low DNS TTL problematic? DNS lookups are often slow on clients.
Wouldn't something like Wackamole (IP address takeover on local networks with
microsecond-latency on failure; dead project now though apparently:
[https://github.com/postwait/wackamole](https://github.com/postwait/wackamole))
help avoid this? We built our load balancers this way at my previous
company...

~~~
jlemoine
You're right that low DNS TTL is not perfect (we saw few providers that
override the TTL to reduce the number of DNS queries going out of their
network, this is a big hack but cause some trouble). This problem is addressed
by our API clients that have different DNS endpoint to reach the 3 machines of
a cluster.

We cannot use any local network IP or load-balancer as we distribute a cluster
on several providers with different autonomous systems. This is how we are
able to offer SLA of up to 99.999% with a big refund strategy:
[https://blog.algolia.com/for-slas-theres-no-such-thing-
as-10...](https://blog.algolia.com/for-slas-theres-no-such-thing-
as-100-uptime-only-100-transparency/)

------
prashnts
> The S3 bucket sits behind CloudFlare to make downloading the binaries fast
> from anywhere.

Why not CloudFront? Moving binaries through CF CDN might not do for you. It's
also seen that they don't really like you moving large data through their
service [0].

[0]
[https://news.ycombinator.com/item?id=12825719](https://news.ycombinator.com/item?id=12825719)

~~~
threeseed
CloudFlare has 105 PoPs. CloudFront has 45 PoPs.

That's a big difference.

------
gumby
This is tangental but: Now the term "bare metal" has been co-opted to mean
"uses an O/S but no virtualization" what are those of us who run our software
on bare metal supposed to call what we do?

------
kevinsimper
Really interesting that they run their application as a nginx module, that
really goes to "keep it simple" and that you may not always need a cluster (I
know they have a cluster, but that handles the clients)

------
hamandcheese
> Before deployment begins, another process has encrypted our binaries and
> uploaded them to an S3 bucket.

Is this to ensure data integrity, or some other purpose?

~~~
dzello
The encryption is for security. The upload is so we can front the S3 bucket
with a CDN for fast download of the binaries from any region.

------
alainmf
Great article!

