

Real-time Search as a Service - davidbarker
http://www.algolia.com

======
nkurz
_To keep your users engaged, search results need to show up instantly and be
relevant to them, even when they do typos._

To try this out, I went to the demo page for searching TV episodes
([http://www.algolia.com/demo](http://www.algolia.com/demo)) and searched for
"The Wire Season 2". Here are the four results given, with the highlighted
portions bracketed:

[The Wire] Gag Reel [Season 2]

[The] Simple Life [Season 2] - Special - The Stuff We [Were]n't Allowed To
Show You

[The] Farmer Wants a [Wife] (Australia) [Season] 6, Episode [2]

[The] Cosby Show [Season 2] DVD Extra: New Interview with [Dire]ctor Jay
Sandrich

Rather than seeking "engagement", I'd put more emphasis on having high quality
search results. Having 3 of the 4 results ignore the properly typed title of
the show is a terrible interface. Correcting "Wire" to match "Director" is
absurd.

The sad part is that these results might make you think that the episodes for
season 2 of the "The Wire" aren't in the database, but they are. But they are,
just not indexed in a way that they are found using the exact phrase "Season
2".

Trying to be more constructive, there is a typo in the first sentence of your
Intro, where the name of your company is spelled wrong. Also, "Real-Time
Search" usually means search against a database that is being constantly
updated. Anyway, I need to get back to screaming at the kids on my lawn.

~~~
js7
Also try this:

"Futurama holiday" shows one results which has the word "Episode" in the
description. Try "Futurama holiday episode" and you get no results.

~~~
redox_
"Episode" is actually part of the HTML label used for display, not the
description.

~~~
js7
Yeah but that information should be picked up in the search

------
MWil
If you're wondering if Algolia is right for you, just ask them. Within 5
minutes of initiating the chat window I had the CEO, Julien helping guide me
through the process of getting my XML into JSON to see if it was right.

Then he asked me more about my use case and actually steered me towards an
Elasticsearch solution since it sounded like a better fit.

All in all we went back in forth communicating for 3-4 days for him to lose me
by necessity and I already feel like a satisfied customer.

~~~
MWil
just to be clear, this was about a week ago

obviously the CEO is watching this thread now so he should be quick to grab
people now as well, I imagine

------
anxrn
I don't understand what makes this particular service tout 'realtime' as its
primary selling point.

Don't all search engines (and other hosted search services) aim for fast (100s
of milliseconds) retrieval, show-as-you-type and realtime indexing?

Don't get me wrong. Getting all this right is very hard, and kudos for the
great performance numbers (vs Elasticsearch), but 'realtime search' smacks of
marketing copy.

~~~
jlemoine
You can try to search-as-you-type on our hacker news search to see the
difference with other search engines:
[http://hn.algolia.com/](http://hn.algolia.com/)

You have relevant results after each keystroke, even with typos. Classical
engines use approximation to perform instant search, like the suggest module
of Elasticsearch.

~~~
anxrn
It seems like for the HN search, your ranking function is the number of votes
(or very highly correlated with it). If this is true, its not solving a
problem as hard as 'classical' engines, which compute a lot more. It would be
great to demonstrate this sort of performance on comparable rank functions. I
don't know anything about Elasticsearch ranking though, maybe they have a very
simple rank function too.

~~~
jlemoine
It is more than just a sort on number of points :) Our value is to be able to
mix textual relevance with business data (in that case the number of points
but is can be the number of page views, number of followers, ...).

You can have a look at our blog post which explain our ranking in details:
[http://blog.algolia.com/search-ranking-algorithm-
unveiled/](http://blog.algolia.com/search-ranking-algorithm-unveiled/)

------
nestlequ1k
No offense, but I hate your business model. Convincing devs to put their
search db in the hands of a small hosted startup is a recipe for disaster (see
indextank).

There must be a better way. ElasticSearch and MongoDB use open source business
models that I think tend to work much better for smart devs picking
technologies (irrespective of their actual products).

~~~
hboon
What was wrong with indextank?

~~~
kwi
They got acquired by LinkedIn and they shutdown the service.

------
johns
Found this because because hnsearch.com is migrating to it. It's very fast.
[http://hn.algolia.com](http://hn.algolia.com)

~~~
jared314
Unfortunately it does not seem to have the accuracy, or breadth, of the old
hnsearch.com. Hopefully this will be fixed in time, but I have found it
lacking relevant results and myself switching back to hnsearch on most
occasions.

I also wonder about all the other small applications in the "HN ecosystem",
like karma tracker, that rely on the hnsearch API. I see that algolia has an
API, but will those other projects just die too?

~~~
redox_
Feedback is always welcome. Do you have a concrete example of a query
returning bad results and what would be the good results?

------
johnnymonster
All great and good to be very fast, but at what price? From their page it
costs $450 for 5mil records. In the search world, this is nothing. So I guess
its going to come down to if your company is at the point where they need to
shave off 1-200ms for hundreds of dollars a month.

Second, I would wait and see how their reliability hashes out before I rely on
them for any production services.

~~~
jlemoine
The search world is very big :) 5 mil records is nothing if you index logs
(which is not Algolia typical use-cases) but for example this is big from an
e-commerce perspective.

~~~
johnnymonster
In the e-commerce world, the difference between 2ms and 200ms isn't that big
of a deal. Search relevance, however, might be something that is important. It
looks like that is something they are focusing on heavily.

~~~
jlemoine
200ms does not allow you to provide search-as-you type and our e-commerce
customers see that performance is linked to conversion.

In term of relevance, we have developed a ranking algorithm specialized on
that case that provide better results than traditional approach:
[http://blog.algolia.com/search-ranking-algorithm-
unveiled/](http://blog.algolia.com/search-ranking-algorithm-unveiled/)

~~~
arghbleargh
Your algorithm seems OK, but what was the "traditional approach" that you
compared it to, and how did you compare them? It seems like you actually gain
a lot from full document search (e.g. products with multi-paragraph
descriptions). Otherwise, you might as well just do a SQL query to get your
results.

~~~
jlemoine
By traditional approach we means all engines that use a unique score to rank
documents (like all engine based on Lucene).

SQL queries are not relevant for text query, for example you have no notion of
tokenization or proximity between words.

We plan to write a blog post on limit of SQL based search.

------
jhonovich
How is this different than swiftype? I ask because I am a current switftype
user and am trying to understand what your case to switch might be.

~~~
jlemoine
Swiftype is a great tool to search for webpages.

Our focus is to search in a database (for example products in e-commerce
website, persons in a social network or CRM, ...).

We offer features dedicated for that use case, you can have a look at these
two demos for example: \-
[http://www.algolia.com/ecommerce](http://www.algolia.com/ecommerce) \-
[http://demos.algolia.com/rapgenius/](http://demos.algolia.com/rapgenius/)

------
ses
I think improving on relevance ranking configuration would be a big boost to
this product as well as offering some ability to cross-search multiple
indexes. Both are quite difficult problems to solve well in search, but if a
simple API service was available that might be attractive for larger
commercial customers.

The icing on the cake would be to have some support for relational (at least
partially relational) data and multimedia / files. Good luck!

------
nodesocket
First of all, great job guys. The library support is fantastic (node.js,
python, ruby, php, even a shell client). We are currently pushing our nginx
logs to ElasticSearch, and was going to use ES for some new features on
[https://commando.io](https://commando.io), but instead we will use algolia.

------
DanBC
Just in case anyone from algolia is reading: search is sub-optimal on mobile

[https://news.ycombinator.com/item?id=7245140](https://news.ycombinator.com/item?id=7245140)

~~~
redox_
Yep right, filtering options are currently hidden on mobile devices, showing
only stories. I'll try to improve that later today.

------
indiehamad
How does it work with API calls? How many calls are typically made by a real-
time search for, say, a 10-letter keyword?

~~~
ndessaigne
We usually recommend to perform one query (one API call) per keystroke
starting from the first one. The actual number of calls depends a lot on the
use-case. Our ranking takes into account both relevance and popularity to
suggest the best result first which greatly reduces the number of letters you
need to type. In use-cases where there is a very strong popularity indicator,
like the number of followers for TV shows, we usually get the correct result
at the first keystroke (b -> breaking bad, d -> dexter). At the other extreme,
you may need to type several words.

~~~
indiehamad
Got it, thanks.

