
MeiliSearch: Zero-config alternative to Elasticsearch, made in Rust - qdequelen
https://github.com/meilisearch/MeiliSearch
======
pqdbr
I'm impressed.

I have a database with 15k documents, each with around 70 pages of text, HTML
formatted.

I'm using ElasticSearch currently, with the Searchkick gem.

30 min playing with MeiliSearch. So far:

\- Blazing fast to index, like 10x more performant than using ElasticSearch /
Searchkick;

\- Blazing fast to search, at least 3x faster in all my random tests so far;

\- Literally zero config;

\- Uses 140MB of RAM currently, while in my experience ElasticSearch would
crash with anything less than 1GB, and needs at least 1.5GB to be usable in
production.

~~~
pqdbr
Since this got upvoted and I see the devs are replying to questions, here are
some! I'm also going to point how ElasticSearch works for comparison.

\- The docs state that `Only a single filter is supported in a query`. This is
kind of a dealbreaker for my use case, since I need at least a `user_id` and a
`status` filter. ElasticSearch can work with multiple filters. Also, don't
understand why you call it `filters` instead of `filter` then. Are multiple
filters in the roadmap?

\- My search UI has a sort by `<select>`, where you can choose, for instance,
`last updated asc` or `last updated desc`, amongst others. In my
understanding, that would be cumbersome with MeiliSearch, since it would
require (1) a settings change to alter the ranking rules order beforehand [0],
which would not even work in production due to race conditions or (2) maintain
multiple indexes each with a pre-defined ranking rule order and switch between
them depending on the UI criteria?

\- As an extension of the last question, I see that a lot of what you call
"search settings" are considered by ElasticSearch query parameters. For
instance, I can easily query ES for the title or description fields just by
setting that as a parameter. In MeiliSearch that would require a change in the
index settings beforehand, right?

PS: The docs, specially in the Ruby SDK, could use some work in the filters
section. It took me a while to understand I should pass a string, like
index.search("query", filters: "user_id:3"). I was trying a hash like
`filters: { user_id: 3 }`.

[0]
[https://docs.meilisearch.com/references/ranking_rules.html#u...](https://docs.meilisearch.com/references/ranking_rules.html#update-
ranking-rules)

~~~
qdequelen
Hi, many answers to these questions. But first, I'll put you on the link to
the public roadmap. A lot of the stuff we're working on is in there. If you
need/love a feature, please add a heart emoji on it.
[https://github.com/orgs/meilisearch/projects/2](https://github.com/orgs/meilisearch/projects/2)

\- Currently, we only support single filters. The multiple filters option is
coming soon.
[https://github.com/meilisearch/MeiliSearch/issues/425](https://github.com/meilisearch/MeiliSearch/issues/425)

\- Custom ranking rules on the fly is something imaginable on our solution. We
didn't do it yet because it complexifies the search query parameters. We are
waiting for feedback like yours to implement this kind of feature.

\- To return only the field you need, it's already possible during the search
[https://docs.meilisearch.com/guides/advanced_guides/search_p...](https://docs.meilisearch.com/guides/advanced_guides/search_parameters.html#attributes-
to-retrieve). To restrict attributes to search in during the query. We had
this feature on a previous version. But like the last answer, no one used it,
and it complexifies the search query.

------
heipei
I know the project doesn't claim it, but the title somewhat implies this: I
honestly don't understand people claiming ElasticSearch is hard to operate,
especially not at small scales. If anything, ElasticSearch for me has been one
of the easiest pieces of infrastructure to operate, for me pretty much "zero-
config". Let me elaborate: You can run ElasticSearch via Docker command-line,
if you want a cluster you just supply IPs of the other nodes. Then you start
indexing documents with simple HTTP calls. You can add or remove nodes at any
time and don't have to do anything but to start another ElasticSearch
instance. If you run out of space or performance just start another node.
Everything needed for management, indexing, search is available through HTTP
APIs, no tools needed.

Clustered ElasticSearch has been rock-solid for me and I've used it in anger
many times. The level of maintenance needed is close to zero, both initially
and long-term. Compare that with the abysmal experience of setting up a
sharded MongoDB cluster for example...

Please enlighten me how ElasticSearch is "a lot of work to operate" (heard
that one multiple times), and what you're comparing it to.

~~~
jniedrauer
I've been bitten by elasticsearch twice in my career, and I've seen others
bitten by it as well. Once you put it in production, you can't just run it
from docker on your workstation. You have to set up a cluster with enough
capacity for whatever load you're going to throw at it, gracefully handle
failures, updates, scaling up as load increases, etc.

There are so many switches and dials to tune, and unless you really learn it
in depth, you won't know which ones you need. It's difficult to even determine
what hardware requirements you have. And it's a hard sell to tell your
business guys "I think elasticsearch will work better if we give it more...
CPU? Memory? Disk speed? I'm not really sure." and can't provide any concrete
metrics to back that up.

Another place where footguns abound is upgrading from one version to another,
_especially_ if you've got plugins installed. There are tricks that you have
to learn the hard way.

At this point, I think long and hard before reaching for a solution like
elasticsearch. If I've got a DBA whose entire job it is to master the tech and
wield it expertly, that's one thing. But if I'm part of an early stage
startup, I just can't justify the lost time and potential for catastrophe.

~~~
natefox
> Once you put it in production, you can't just run it from docker on your
> workstation.

But that's true for any data store. This isn't any different. Nor is an RDBMS.
They all need HA/replication. And that is rarely trivial.

Honestly, I think this is why managed/hosted solutions (AWS RDS for example)
are so popular - they remove a large part of the complexity for you.

~~~
hashhar
Not all data stores. You can go quite far with an out of the box Redis
instance or even PostgreSQL. No fiddling needed unless you are in triple digit
QPS ranges.

------
ghh
I wanted to mention Sonic [1] as another lightweight document indexing
alternative written in rust, when I found MeiliSearch to provide a thoughtful
comparison page [2]

[1]
[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)

[2]
[https://docs.meilisearch.com/resources/comparison_to_alterna...](https://docs.meilisearch.com/resources/comparison_to_alternatives.html#comparisons)

------
beagle3
Mostly "made in Rust", but from the github readme[0] "MeiliSearch uses LMDB as
the internal key-value store. The key-value store allows us to handle updates
and queries with small memory and CPU overheads."; so a lot of the credit goes
to LMDB, and safety implied by "made in Rust" is not, in fact, guaranteed.

Not that I'm complaining - I love LMDB, and it's been rock solid and bug free
in my experience (thanks, Howard!) - but it's low level C, not rust, and if
you expect the certainty that Rust provides w.r.t to security, race conditions
and leaks, be aware that you are not completely getting it.

But other than that: Thanks! This looks like a great project!

[0] [https://github.com/meilisearch/MeiliSearch#how-it-
works](https://github.com/meilisearch/MeiliSearch#how-it-works)

~~~
burntsushi
True, but there are significant components in pure Rust, such as `fst` (full
disclaimer, I wrote it). Which is written in purely safe Rust.

> and safety implied by "made in Rust" is not, in fact, guaranteed

Just about every Rust program depends on some C code, usually at least in the
form of a libc. So you could lodge this criticism against almost every Rust
project.

> and if you expect the certainty that Rust provides w.r.t to security, race
> conditions and leaks

Rust's safety story covers neither race conditions nor leaks.

~~~
comex
> Rust's safety story covers neither race conditions nor leaks.

It covers a type of race condition, namely unsynchronized concurrent access to
memory.

~~~
burntsushi
Data races and race conditions are orthogonal, according to some:
[https://blog.regehr.org/archives/490](https://blog.regehr.org/archives/490)

I think you've disagreed with this in the past, and I don't know how to
resolve that. But certainly, I think we can agree that saying that Rust's
safety story prevents race conditions is, at minimum, very imprecise.

~~~
comex
Oops, I wasn't trying to reopen an argument. I didn't recall us having
discussed it before (still don't, but I forget things easily).

And yeah, I'd agree with your last sentence.

~~~
burntsushi
Ah yeah, it could have been someone else... Not sure. It was a while ago.
Anyway, I don't personally have a strong opinion here on definitions here.
(I'm not qualified to.)

------
ghayes
The goal of ElasticSearch, I always thought, was that it scales horizontally
and can handle the loss of multiple nodes without availability- or data-loss.
It's interesting to build a single-server replacement, and this can likely
work for many use-cases, but it's definitely a different approach from
ElasticSearch itself.

~~~
tpayet
Replication for MeiliSearch is on its way :) The main differentiator is that
MeiliSearch algorithms are made for end-user search not for complex queries.
MeiliSearch focus on site search or app search, not analytics on hyper large
datasets

~~~
rjammala
what is the size of the largest dataset that you have indexed with
MeiliSearch?

~~~
tpayet
We are currently working with this dataset:
[https://data.discogs.com/?prefix=data/2020/](https://data.discogs.com/?prefix=data/2020/)

It's a dataset of 107M songs, 7.6 Gb of compressed files which represents 250
Gb of disk usage by MeiliSearch. We are indexing the release, song and artists
names.

We also work with a dataset of 2M cities that we can index in less than 2
minutes when the db uses 3 shards.

------
MuffinFlavored
The real power of Elasticsearch for me is the ability to filter logs by:

1\. exact match this nested JSON field (with support for lists of values)

2\. negative match this nested JSON field (with support for lists of values)

coupled with the ability to filter by "timeframe", then pump it through to
visualizations (tables/graphs) in Kibana

MeiliSearch would be cool if it spoke the API Kibana expects from
Elasticsearch

~~~
mleonhard
If only one could set up Elasticsearch and Kibana using infrastructure-as-code
(IaC). I spent several days trying and still haven't succeeded. Elasticsearch
config is full of foot-guns.

There are tons of easy setup examples but they lack access control and
encryption. All of my servers must write logs. When one of them gets cracked,
the attacker must not be able to read all the other servers' logs and steal
all the PII. An attacker can use an ARP attack to MITM server connections to
Elasticsearch. Without encryption, that attack yields all the PII.

I hope Meilisearch can someday help fill this gap in the free DevOps toolset.

------
bryanrasmussen
ok I just looked through things a bit but the phrase 0 config worries me -
first off I could conceivably run ElasticSearch with 0 configuration but then
it needs to make decisions as to what types things are, and how things should
be analyzed, and sometimes those decisions are not what I want.

Often ElasticSearch makes a mistake in typing because the programmer has made
a mistake in data format, if you fixed that mistake your data would now not
fit the format that ElasticSearch has chosen for it (actually don't know if
this is still a problem because it has been years since I have ran without all
my fields being mapped first) but actually don't see how it couldn't be a
problem.

so theoretically if you didn't want to go through the trouble of defining a
wrapping you could just reindex all your data fixed in such a way that
ElasticSearch will choose a better type for individual fields but why would
you do this?

And I mean what does MelliSearch do? I wonder - because looking through this
code here
[https://github.com/meilisearch/MeiliSearch/blob/master/meili...](https://github.com/meilisearch/MeiliSearch/blob/master/meilisearch-
schema/src/fields_map.rs) (and not being a rust guy my understanding of it is
probably off) but it seems like maybe it is no configuration because it
expects you to follow its semantics. Which to be fair lots of things do, at
the base level, everything has a title, description, date.

But if I have a domain with different or probably more advanced semantics what
happens?

Search Engines are generally configurable because you want to add other fields
and rank hits in those fields higher than other things, or maybe do a specific
search that only targets those fields - like say Brands based search.

on preview: lots of other people with similar views it seems, I got maybe a
bit ranty just because the title sets me off when it just is so wrong it even
seems like lying.

------
manigandham
Awesome, glad to see all the competition in the search space now. There are
other projects like Sonic, Tantivy, Toshi and more that have more
functionality if you need alternatives.

Here's a public list of search projects (in rust, c, go):
[https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...](https://gist.github.com/manigandham/58320ddb24fed654b57b4ba22aceae25)

~~~
tmzt
Are there any that fit the log searching use case, apart from loki which
doesn't do full text searching?

~~~
DarkCrusader2
I think vector[0] does what you are asking for.

[0] [https://github.com/timberio/vector](https://github.com/timberio/vector)

~~~
tmzt
Thanks, but I'm looking for alternatives to the ElasticSearch in that diagram.

Vector looks like a good choice to fill the index, and should be easy enough
to modify to support Meili.

------
seemslegit
Hardly an "alternative to Elastic search" if only because the later is
scalable beyond a single machine.

This overhyped description coupled with on-by-default analytics suggests to me
MeiliSearch should be dismissed regardless of potential usefulness or
technical merit.

~~~
greendave
The analytics seem pretty benign.

"We send events to our Amplitude instance to be aware of the number of people
who use MeiliSearch. We only send the platform on which the server runs once
by day. No other information is sent. If you do not want us to send events,
you can disable these analytics by using the MEILI_NO_ANALYTICS env variable."

~~~
seemslegit
The practice itself is malignant, either explicitly ask upon first run or
require a MEILI_YES_ANALYTICS env variable to enable it.

~~~
turdnagel
That would require configuration. This is zero-config.

~~~
computerex
It'd still be zero-config to provide it's primary function. I don't think
anyone would say anything against MeiliSearch or not consider it zero-config
had they decided to enable analytics off an env var rather than having
analytics be sent by default.

~~~
qdequelen
I would like to clarify that by analytics we are only talking about 1 ping per
day that sends a hash that allows us to uniquely identify a user. The privacy
of the users is kept. It just serves us to know if our work is being used.

~~~
seemslegit
> I would like to clarify that by analytics we are only talking about 1 ping
> per day that sends a hash that allows us to uniquely identify a user.

This is already pretty invasive - it discloses activity, number of machines
deployed, IP address which identifies location and often organizations and
individuals (and which is considered protected personal data per the GDPR
afaik).

> The privacy of the users is kept.

No it isn't, see above - even as the authors you don't get to decide what data
does or doesn't infringe upon the users privacy.

> It just serves us to know if our work is being used.

This is irrelevant, if you want to condition the use of your work on being let
known where and how it is being used then license it accordingly and abide by
the applicable laws.

------
time0ut
Nice. This looks promising. Very clean API. I like the focus on a narrow use
case.

Do you have any information on security topics like using TLS, client
authentication, etc?

~~~
Kerollmops
Currently we think this kind of security can be enabled by a simple nginx
setup, allowing autorefesh of certificates easily (e.g. certbot). But in the
future we will probably handle that in the engine itself.

~~~
mst
I was thinking it might be nice to be able to have an HMACed token with an
expiry as an option - so e.g. my main http-serving thing could provide one of
those to allow the frontend to read for a bit but kick the user off after half
an hour or whatever if the token isn't refreshed.

I've no issue with offloading SSL to a different process though, I tend to
prefer doing that anyway a lot of the time.

~~~
Kerollmops
I understand what you mean but is it for a specific usage of a search engine?
I was thinking that this kind of time-restricted tokens could also be managed
by the nginx instance, our engine doesn't support that for the time being.

~~~
mst
I _can_ make the nginx instance do that.

Was just thinking for "simplest possible deployment" it would be nice to be
able to have clients hit the meili instance basically directly to take maximum
advantage of the speed.

Note that I'm not saying "this should be priority 1" or anything, I'm already
thinking about how to configure nginx to handle the hmac crap if I try meili
out :)

------
nreece
Looks pretty good. The single filter approach is restrictive though.

We're currently leaning towards Manticore Search[1], which is a fork of Sphinx
Search[2].

[1] [https://manticoresearch.com](https://manticoresearch.com) [2]
[http://sphinxsearch.com](http://sphinxsearch.com)

~~~
qdequelen
We are working on multi filters and faceting.
[https://github.com/meilisearch/MeiliSearch/issues/424](https://github.com/meilisearch/MeiliSearch/issues/424)
[https://github.com/meilisearch/MeiliSearch/issues/425](https://github.com/meilisearch/MeiliSearch/issues/425)

------
otterley
MeiliSearch appears to be more of an alternative to Lucene than it is to
Elasticsearch. Lucene is the search engine that runs on a single instance; ES
is the horizontally-scalable distribution and aggregation layer atop the
instances. Absent a similar aggregation layer, MeiliSearch isn't "elastic" as
the comparison implies.

~~~
tpayet
Actually Lucene is the library for search that Elastic uses under the hood.
Lucene does not provide any HTTP API, which Elastic does. Before using Lucene,
you have to build the interface around it.

In this way MeiliSearch is comparable to ES, especially for site search and
app search working out of the box as standard with its http api.

MeiliSearch does not offer distribution yet, but it is something the team is
working on :)

~~~
otterley
My concern is that by comparing it to Elasticsearch, you implicitly minimize
the amount of engineering effort required to go from single-node to a
distributed system. It is a non-trivial exercise that you will undoubtedly
realize once you get into the dirty details.

------
Bedon292
While this might be an alternative for that one specific use case (search
bar), it does not feel like a viable alternative to ES. I am sure it is great
at that specific case, and don't want to knock them on that. But, I have never
used ES for a simple search like they are. when I use ES, I want to store
billions of records redundantly and search them by text, time, and/or
location. And then create visualizations with the results.

When I first read the title I thought it might be a Rust based Lucene engine
or something, and thought that would be pretty cool. Though no idea how that
would work. On its own, this is a pretty nifty little tool, however I think
the framing as an ES alternative is what feels wrong to me, and apparently
others in the comments as well.

~~~
nicoburns
[https://github.com/tantivy-search/tantivy](https://github.com/tantivy-
search/tantivy) is a Rust based Lucene-alike.

------
dalore
Wow to see this popup is strange as I was just implementing this yesterday.

It is blindingly fast and easy to setup.

------
mleonhard
> MeiliSearch can serve multiple indexes, with different kinds of documents,
> therefore, it is required to create the index before sending documents to
> it.

[https://github.com/meilisearch/MeiliSearch#create-an-
index-a...](https://github.com/meilisearch/MeiliSearch#create-an-index-and-
upload-some-documents)

Indexes are config. This is not really zero-config if you require API calls
before it can receive data.

Also, there's nothing about TLS or access control. These will be required for
any production deployment. At the minimum, let us specify a TLS key.pem and
cert.pem file and create write-only and read-only access tokens.

------
karterk
If you are looking for alternatives, check out Typesense as well:

[https://github.com/typesense/typesense](https://github.com/typesense/typesense)

------
maxpert
How does it compare to Sonic
[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)

------
niyazpk
Does anyone know if this supports bulk indexing? My team has a lot of data in
S3 in parquet format. (We could change the format to something else if that
helps).

It would be really nice to be able to point tools like MeilliSearch or
ElasticSearch to a data location and have it index all the data without me
writing code to send individual records to the API.

~~~
Kerollmops
This is not something that MeiliSearch supports currently but I am working on
making the engine be able to index other formats than JSON, I saw great
performance improvements when indexing simple CSVs.

We will probably make MeiliSearch accept different indexable formats (i.e.
CSV, JSON, JSON-lines) in a future version.

------
throw03172019
Are the documents stored on disk or only in memory?

~~~
tpayet
We are using LMDB as the key/value store, so the documents are memory-mapped
(usually on disk, and in memory when needed)

------
dzonga
looks really easy to use. will use this instead of resorting to Postgres full
text search for my next app(s)

------
bradrobertson
Looks promising! Are there any docs coming on a production ready setup?
Reading below it looks like you're working on high availability, but even in
the single machine scenario, do you have recommendations for persistence,
fault tolerance etc?

~~~
Kerollmops
I would say that you must add your own nginx (or else) in front of our HTTP
only engine, in term of fault tolerance we are working on high availability.

------
throw03172019
We use Algolia and use the public API keys with search filters encoded so they
can only search their data (I.e. account_id:123)

Is there anything similar here? Otherwise all the queries need to go through
our servers first to ensure the filter is present.

~~~
Kerollmops
The current API key system is a simple and temporary solution.

We will work on a more feature-full API key system including the one you are
talking about. This is on our roadmap IIRC.

------
udfalkso
Sounds more like a potential alternative to Sphinx than Elastic Search.

sphinxsearch.com/

------
eliseumds
Pretty heavy user of ES here, and one cannot compare the two products.

~~~
the_arun
What is the rationale for not comparing the two?

~~~
wolco
Only one filter. Very fast limited search. Not great for anything remotely
complex like searching with two conditions.

------
kvz
Is there already a browser library that can talk to MeiliSearch?

~~~
Kerollmops
Yes, there is, you can find all clients on this documentation page:
[https://docs.meilisearch.com/resources/sdks.html](https://docs.meilisearch.com/resources/sdks.html)

Note that we are reworking the js library and there will probably be React
integration too!

------
dhruvkar
I've never used elasticsearch and only had a brief toy project with Algolia.
The demo on the github repo looks awesome.

Can this run on top of my postgres database?

~~~
Kerollmops
To make MeiliSearch expose the documents that are stored in your PostgreSQL
(or any other database) you must extract them and store them in our engine
using the HTTP API we provide to you.
[https://docs.meilisearch.com/references/documents.html#add-o...](https://docs.meilisearch.com/references/documents.html#add-
or-replace-documents)

For that you will need to also define the different attributes your document
is composed of.

We thought about providing a simple tool to extract the documents from an SQL
table into the MeiliSearch directly.

------
bberenberg
Does any tool in this category (This, Elastic, or whatever else) support
something like permissions on a per document level?

~~~
jschumacher
Hey B! Funny seeing you here. I'm now running product at
[http://sajari.com](http://sajari.com)

You will find that most tools provide document level permissions to some
degree by storing user/group IDs on the document and adding filters to the
query. However, it generally requires custom implementation work to integrate
it into your systems and prevent spoofing of the filters.

Hope you're doing well!

------
social_quotient
Thanks for including performance metrics right up front!

