
How Algolia (YC W14) Built Their Realtime Search - redox_
http://blog.leanstack.io/how-algolia-built-their-realtime-search-as-a-service-product
======
karterk
For the search index, I wonder why they did not go with ElasticSearch or Solr,
and decided to roll their own replication/distribution layer. RAFT is much
simpler than Paxos, but it's still something pretty difficult to get right (as
with any distributed system). Given that both ES and Solr have already got
this figured out, what were the specific reasons for rolling your own
C++/Nginx layer?

~~~
jlemoine
Search engine needs have evolved a lot since Lucene was designed (Lucene is
the search library behind ElasticSearch and Solr). Users now want an instant
search that search at the first keystroke and are able to tolerate typos.
Lucene does not propose anything special for that, so you have mainly two
options: \- slow down indexing: index all ngrams of words (this also require a
lot of disk) \- slow down search: you need to find all words that match the
prefix query in the dictionary and then retrieve all inverted lists. ES and
Solr have added autocomplete module that is basically a compiled Trie that
associate a string to a results set, but this is not a real instant-search
engine, there is no intersection of inverted lists and this produce mismatch
between the autocomplete and search results.

On our side, we have redesigned all search algorithms and data-structures from
scratch to run on a mobile devices. We are for example able to search all
cities from Geonames (> 3M) on a old iPhone 3GS in less than 50ms (including
prefix search & typo tolerance) with only 1KB or RAM and a flash disk. You
can't do that with Lucene. So I let you imagine what we can do with a high-end
server ! For the High-availability & Consensus, this is something complex and
we have spent a lot of time to test it by killing machines of a cluster, and
we still continue to test it all the days since every deployment kill index
builder process !

------
siculars
From my reading it seems that they are fully consistent between the three
nodes in their cluster. This seems to suggest that any failure will block
writes. It also seems like since each node has a full copy of the index, that
means your entire index must exist within the confines of one machine. So how
large of an index can you have in reality? Am I missing something?

~~~
thu
It seems they accept writes as long as at least two clusters agree to accept
it (which the majority in the 3 clusters etting the have).

I would think that each cluster has the whole index, so any cluster can be
used for reads.

I would also think they have multiple such triple-clusters; one triple for
each important customer.

(It is difficult to know if a (logical) node is actually a cluster or not.)

------
gabriel34
The SSD option intrigues me. Lately I find that using many consumer class HDs
in more servers is better for this kind of application than using enterprise
grade HDs or SSDs

~~~
jlemoine
Consumer disks would not give the same performance, we build & read a lot of
indices in parallel, this produces a lot of random IO on disks. With two disk
in raid 0, we are able to reach 1GB/s of random IO.

~~~
gabriel34
I see, but what I found interesting was that the price/performance ratio of
this solution proved better than getting, say, four HDs in each machine or an
extra machine, which is far more common in today's clusters (e.g.:
Hadoop/Vertica clusters uses I guess the profile of the IO (as you said, lots
of random IO) contributed a lot for this. Could you elaborate on the decision
process that lead to this

~~~
jlemoine
Why do you think consumer disks are better for a search engine?

Everything is a balance between CPU/RAM/Disk. Having one of this resource that
is far superior than the other would be a waste of resource. Our balance is
globally to have about 16 cores/32threads per host with latest XEON >= 3.5Ghz,
256G of RAM and about 1TB of SSD (2X512G)

We perform a lot of IO, so the compromise is between price and number of IOPS
we can get. Having the same IOPS than our two SSD in raid0, would require a
lot of consumer disk, which leads to two problems: 1) We would have too many
disk space and we will not have enough CPU to use all this space 2) It would
not fit in a 1U server and would increase a lot the price of the server

~~~
gabriel34
I was thinking more of an architecture comprising either:

1\. more servers with less capacity (CPU,RAM,Disk) each 2\. same number of
servers, but six + Hard Drive Raid0 in each

Both of which could be better for other apps but include their own set of
problems and, as you pointed out, could result in higher costs.

I think I was misunderstood. I was not questioning the architecture chosen,
sorry if I sounded that way. I want to learn why was it chosen since I don't
see many like it. So far you have provided me with some very satisfying
answers, and for that I thank you.

