

Full text search in your Data: How you can do better than Elasticsearch - arthurlenoir
http://blog.algolia.com/full-text-search-in-your-database-algolia-versus-elasticsearch/

======
Mpdreamz
"mixing relevance and popularity is nothing short of impossible in
Elasticsearch. Either you sort by relevance or by using a popularity
attribute, you cannot mix both."

This is a false statement,

[http://www.elasticsearch.org/guide/reference/query-
dsl/custo...](http://www.elasticsearch.org/guide/reference/query-dsl/custom-
filters-score-query/)

This combined with scripts give you unlimited possibilities to alter you score
based on whatever you please. The syntax is a bit wonky in the current version
perhaps but awesomeness is on the way:

[https://github.com/elasticsearch/elasticsearch/issues/3423](https://github.com/elasticsearch/elasticsearch/issues/3423)

"Unfortunately Elasticsearch fuzzy matching does not work out of the box, is
complex to customize, and does not provide the ability to highlight prefixes."

There are more ways to catch typo's then fuzzy and levensteins, ngrams for
instance. Elasticsearch allows you to do both but yes its true you have to
know your way around analyzers/tokenizers and mapping a little bit to get the
best results in elasticsearch. If you use the ngrams approach highlighting
also works alot better.

"This sorting configuration might seem pretty explicit, but it is in fact
quite dangerous as it conflicts with the boost on fields. To better understand
the problem, let’s look at the query ‘the rains’:"

Its true sorting trumps boosting, but given the assumption you cannot alter
_score this whole section seems contrived.

In the instant search section they use elasticsearch's querstring query to
search for `world w*` this is indeed a very slow way since it will generate a
wildcard query in the background they probably should have written the query
using a phrase prefix query.

~~~
jlemoine
Thanks for your feedback, I am the author of the article.

1) You are right that it is possible to mix both popularity and relevance, but
you need to use boost and store everything in the float _score. This is
dangerous and has side effects (for example you have a big risk of obtaining
at some point a hit with typos before an exact one). It is really difficult to
control ranking with boosts.

2) The ngrams approach is indeed an alternative. But it also has major
drawbacks in term of relevance, mainly for the proximity between terms.

3) Phrase query is a good way to improve performance but it breaks user
experience if the terms are not close together (these hits are not in the
search results). It's better to let the proximity do its job and impact the
ranking.

