
How Algolia Built Their Realtime Search as a Service - rayascott
https://stackshare.io/posts/how-algolia-built-their-realtime-search-as-a-service-product
======
jsty
Algolia had a great 8-part series of 'under the hood' blog posts:

Part 1: [https://blog.algolia.com/inside-the-algolia-engine-
part-1-in...](https://blog.algolia.com/inside-the-algolia-engine-
part-1-indexing-vs-search/)

No affiliation, just thought they were really interesting.

~~~
amelius
I just skimmed through those posts. Indeed interesting, but I couldn't find
what type of data-structure they use for their main search (as opposed to
instant search suggestions).

~~~
ddorian43
An inverted index ? From part 2:

    
    
        For each document, we extract the list of words and build a hash-table that associates words to documents
        When all documents are processed, we compute an on-disk binary data-structure containing the mapping of words to documents. This data-structure is the index we will use to process queries.

------
thomasfromcdnjs
If you have never used Algolia, I'd recommend doing so. For any reason.

They are one of those great companies that you want to emulate, their entire
setup is polished and perfect.

~~~
vonseel
Don't they provide the HN search? I have never found it particularly
effective.

~~~
donmatito
The HN search implentation is a bit sad, because it lacks the most magic part
of Algolia : the real time search.

Using Algolia in a full-page-load HTML request is using a Ferrari do drive to
the grocery store. You can, but what a waste

~~~
Artemis2
Have you tried [https://hn.algolia.com](https://hn.algolia.com)?

~~~
donmatito
Yes I had, but I forgot about it, thanks. Much better than HN own
implentation.

Algolia also implemented a linkedin contact search infinitely superior to
Linkedin own search.

I remember thinking that both demos were really brilliant growth strategy
because it showed clearly, by contrast, how much the status quo was painful

------
MrBuddyCasino
"The problem with Zookeeper is that a change in the topology can take a lot of
time to be detected. For example, if a host is down, it can take up to several
seconds to be detected. This is way too long for us, as in one second will
have potentially thousands of indexing jobs to process on the cluster and we
need to have a master that attributes an ID to be able to handle them. So we
have built our own election algorithm based on RAFT."

Thats a bold move. Afaik the timeouts in Zookeeper can be tuned, no?

------
karterk
Shameless plug: if you are looking for a simple, fast, fuzzy search engine
that you want to host yourself, take a look at Typesense:
[https://github.com/typesense/typesense](https://github.com/typesense/typesense)

~~~
Antrikshy
Only tangentially related, but...

If you're looking for a front-end library for autocompletion, Twitter's
typeahead.js is a nice one:
[https://github.com/twitter/typeahead.js](https://github.com/twitter/typeahead.js)

If you want one that works seamlessly with your React setup, React Autosuggest
is pretty neat: [https://github.com/moroshko/react-
autosuggest](https://github.com/moroshko/react-autosuggest)

~~~
vvoyer
Careful though because typeahead.js is not maintained at all even if not said
clearly on the GitHub repo

------
nl
I noticed that Google (who know a bit about search I think) link to Algolia as
the recommended way to do search if you are using Cloud Firestore.

That seems a good recommendation.

~~~
exclusiv
Yeah it's solid and easy to setup by using cloud functions on firestore
events.

Querying on Firestore is really limited though and quite surprising really.
They even have geopoint data types but you can't query on them.

I'm sure they'll add a lot more soon but Algolia has been great for filling in
those gaps and adding robust search.

~~~
tootie
Straying OT, but I find Google Cloud's array of database options to be
hopelessly confusing. Particularly with the Fire* offerings that are all
really cool, but seemingly limited to mobile use cases.

------
donmatito
Every time I need to use Algolia in a project, there is this sense of "wow".
It's such a magical feeling. Every single time.

------
curiousgal
Well honestly, when searching through HN posts, their fuzzy search deature can
be annoying.

~~~
DanBC
It can be turned off.

------
amelius
Offtopic: isn't it about time that Linux distros get a serious "built-in"
search engine?

I mean, there's the "locate" command, but many people disable it because it's
a performance hog. Shouldn't "search" be an integral part of OS and/or
filesystem architecture?

~~~
e12e
I actually use mlocate a bit, one benefit is that it's a simple system, easy
to understand. And it's seen some use, so it tries to keep permissions
consistent between the index/search and the filesystem.

There's also been a few attempts at integrating search with the various
desktop projects, generally backed by xapian or some other full-term search
library. I'm not sure what are currently the best/best maintained options.

I seem to recall gnome "tracker" had the most traction last I checked, not
sure if eg the kde project has something similar. Looks like canonical booted
tracker from the default install in 18.04 lts:

[https://community.ubuntu.com/t/install-tracker-by-default-
in...](https://community.ubuntu.com/t/install-tracker-by-default-
in-18-04-lts/1483/28)

Microsoft had its aborted attempt at a new fs built on top of sql server, for
"proper" search at the fs level. I'm not aware of any real file systems that
do full search out of the box. And I guess it's not clear that it'd be any
better than initial indexing+index refresh on change/on a schedule.

~~~
emilsedgh
KDE has Baloo [0]

[0] [https://community.kde.org/Baloo](https://community.kde.org/Baloo)

------
serguzest
I've always thought they were using elasticsearch!

~~~
dvirsky
The speed was a telltale that it's not the case ;)

~~~
amelius
Does anybody know where statistics/benchmarks are posted for elasticsearch?
This would be useful for performance comparisons with other products, and to
see if anything is wrong with the configuration in case of slow queries.

~~~
ddorian43
[https://benchmarks.elastic.co/index.html](https://benchmarks.elastic.co/index.html)

[https://github.com/elastic/rally](https://github.com/elastic/rally)

------
RyanShook
Can the concept of distributed consensus be seen as an alternative to block
chain? [https://raft.github.io](https://raft.github.io)

~~~
zapita
No, blockchains are built on top of distributed consensus. You can have
distributed consensus without a blockchain, but not the other way around.

