Pretty cool, how does it compare to the Rust pendant https://crates.meilisearch....

tpayet · on Jan 29, 2020

Apart from being written in Rust, MeiliSearch (https://github.com/meilisearch/meilisearch) differs mostly on the use of a bucket sort to rank the documents retrieved within the index.

Both MeiliSearch and Typesense use a reverse index with a Levenshtein automaton to handle typos, but when it comes to sorting document:

- Typesense use a default_sorting_field on each document, it means that before indexing your documents you need to compute a relevancy score for typesense to be able to sort them based on your needs (https://typesense.org/docs/0.11.1/guide/#ranking-relevance)

- On the other hand MeiliSearch, uses a bucket sort which means that there is a default relevancy algorithm based on the proximity of words in the documents, the fields in which the words are found and the number of typos (https://docs.meilisearch.com/guides/advanced_guides/ranking....). And you can still add you own custom rules if you want to alter the default search behavior.

karterk · on Jan 30, 2020

> Typesense use a default_sorting_field on each document, it means that before indexing your documents you need to compute a relevancy score for typesense to be able to sort them based on your needs

This is not entirely true. You can just use a field with a constant value (say 100) as the default sorting field and Typesense will just use the text based relevancy. Please update your comment.

The reason why Typesense does insist on one, though is that it's always a good idea to have a field that indicates popularity of a record (or a proxy to it). It makes search so much better.

KajMagnus · on Jan 29, 2020

How would you say Typesense and MeiliSearch compare with Tantivy + Toshi?

(those two are a bit like Lucene + ElasticSearch — but written in Rust)

tpayet · on Jan 29, 2020

Lucene was written for public search engine like Google, or DuckDuckGo (which is actually based on Lucene and Solr).

Lucene and Lucene-like projects (Tantivy or Bleve in Golang) are general-purpose search libraries. They can handle enormous datasets, and you can make very complex queries on them (compute the average age of people named Karl in a certain type of document for example).

These libraries are based on tf-idf (term frequency inverse document frequency) algorithm and manage quite poorly typos for example (unless you make the setup to index your documents differently to parse them correctly).

Toshi is like Elastic for Lucene, it provides sharding and JSON over HTTP api.

You can basically used Lucene and its derivatives for basically any search related project, but you may have to dive into how it works and understand concepts like tokenization or ngrams to tune it according to your needs.

On the other hand, MeiliSearch (and I guess Typesense, but I can not talk for them) focus a subset of what you could build with Lucene or Elastic.

It is a fully functionnal Restful API, made for instant search or search-as-you-type. The algorithms behind MeiliSearch are simply different: a inverse index, with a levensthein automaton to handle typos, then a bucket sort you can tune for the ranking of the returned documents. The aim is to provide a easyer go-to solution to implement for customer-facing search.

You won't be able to make super complex queries on terabytes of data. We just make super fast and ultra relevant search for end-user.

TypeSense and MeiliSearch focus on the same usage, we choose Rust for performance, security and the modern ecosystem that will allow easier maintenance :D

fulmicoton · on Jan 30, 2020

tantivy main dev here. Just chiming in to confirm this is an accurate answer.

KajMagnus · on Jan 30, 2020

Thanks @tpayet and @fulmicoton for the info :- )

@fulmicoton and @tpayet, I'm thinking then if I want both full text search, and also faceted search, then Tantivy can do that, but at this time, MeiliSearch (and Typesense) don't do that?

( When I look here: https://github.com/meilisearch/MeiliSearch in the features list, I see no mentioning of faceted search. Whilst Tantivy does list faceted search as a feature: https://github.com/tantivy-search/tantivy )

@tpayet, for a database like MeiliSearch, is faceted search typically always off-topic? Or you're thinking about adding faceted search, later on?

(My use case is 1) full text search, 2) typo friendly, and with 3) e.g. "begin" matching also "began", "begun", and "run" also matching "running", and 4) in all lanugages, and 5) faceted-search restricted to tags and categories and user groups.)

karterk · on Jan 30, 2020

Typesense does support faceted search. Look for the `facet_by` example in this section: https://typesense.org/docs/0.11.1/api/#search-collection

KajMagnus · on Feb 2, 2020

Thanks! Based on what I read about Typesense, I'm thinking this faceted search happens in-memory (so one would want ok much RAM)

tpayet · on Jan 30, 2020

You are welcome.

MeiliSearch does not offer faceted search yet. It is one of the key feature that we are still missing but we plan to work on it in the coming weeks.

For your use case today, I suggest you use Typesense if it fits your needs ( they handle faceted search already ) or Tantivy or Toshi.

To manage different languages, you should make one index per language. MeiliSearch and Tantivy handles kanjis! We will add faceted search in the coming weeks if you are not in a hurry :D

KajMagnus · on Feb 2, 2020

Thanks for the info :- )

As of now, my project uses ElasticSearch — it works fine, but it wants lots of RAM which I find slightly annoying, ... and the new Java v 9, 10, 11 etc feels a bit worrying. — So, as of now I'm just staying up-to-date with new Rust based search engines.

ddorian43 · on Jan 29, 2020

DuckDuckGo isn't based on lucene/solr but bing-api.

tpayet · on Jan 29, 2020

my bad, my informations could be outdated.

based on http://highscalability.com/blog/2013/1/28/duckduckgo-archite...:

The fat tail queries go against PostgreSQL and the long tail queries go against Solr. For shorter queries PostgreSQL takes precedence. Long tail fills in Instance Answers where nothing else catches.

It seems that Bing is now a part of their sources indeed: https://help.duckduckgo.com/results/sources

bmn__ · on Jan 30, 2020

Both are used. Source: Torsten Raudssus who works for DDG as developer liaison.

ddorian43 · on Jan 30, 2020

You saying they do websearch in lucene/solr ? How many TB do they have in solr ?

Dowwie · on Jan 30, 2020

Have you contrasted MeliSearch with Sonic?

tpayet · on Jan 30, 2020

Yes :) Valerian Saliou, the maintainer of Sonic is a friend of us. He built Sonic mainly for his company (crisp.chat) and compared to MeiliSearch there is no relevancy ranking.

Sonic is "just" a inverted index with a levenshtein automaton, it will returns only the documents ids in which there is your requests words and then you will have to retrieve the full documents in your main database and only then can you apply some relevancy ranking by yourself.

Dowwie · on Jan 31, 2020

Thanks for sharing this info! You may want to add these useful comparisons to your Readme. I will keep an eye on MS and wish you and team the best.

tpayet · on Jan 31, 2020

Thank you so much, we will definitly add these comparisons!